Mehdi Amini via llvm-dev
2016-Sep-07 17:35 UTC
[llvm-dev] [RFC] One or many git repositories?
Hi,> On Sep 7, 2016, at 10:30 AM, dag at cray.com wrote: > > Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> writes: > >> Right, we actually have a proposal to take what is in the current SVN >> repo here: http://llvm.org/svn/llvm-project/ and migrate this to a >> single repository. >> I was not sure if you were referring to this proposal (monorepo) or to >> the recent emails about “external libraries” that GCC uses like gmp >> and mpfr. >> >> You can find more details here: https://reviews.llvm.org/D24167 >> >> If you have some good reasons why you would think a proposal would be >> problematic to you, or one would better fit your workflow, feel free >> to expose them now. > > It could be problematic for us depending on how the monorepository is > structured. We reference the LLVM git repository directly and use it to > migrate to new versions, pick patches, etc. If LLVM proper were part of > a larger repository that becomes more difficult to do because the commit > file paths won't match. We'd be back to essentially manual diff+patch > which is quite a step backward from the smoth git-oriented process we > use now.Can you clarify what you mean? Which part of the process would quite manual patch that wouldn’t otherwise?> > The document says that the individual git repositories will remain. > Does that mean the monorepository is using git-submodule to manage the > aggregate repository?First, have you read this document: https://reviews.llvm.org/D24167 <https://reviews.llvm.org/D24167> ? TLDR: The answer is no: you have to see it as it is today, i.e. a single SVN repo containing all the sub-projects, and “exports” in individual repositories. The same thing after: a single git repo containing all the subprojects side-by-side and the *same* “exports” in individual repositories.> If so that should work for us. I'm more > concerned about the case where the individual repositories' histories > were interwoven into a single repository and the individual repositories > went away. > > I have extensive experience transitioning a very large project from a > set of individual repositories to a single repository where we interwove > the individual histories. It was the right direction for us but I don't > think it would be for LLVM. > > I completely understand the benefits of a monorepository. One of the > biggest for us was the ability to git-bisect across components. How > does git-bisect work with submodules? I have very little experience > with submodules but would like to learn more.Fairly easy, the document mentions it in the examples. — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160907/0602283b/attachment.html>
Mehdi Amini <mehdi.amini at apple.com> writes:> It could be problematic for us depending on how the monorepository > is structured. We reference the LLVM git repository directly and > use it to migrate to new versions, pick patches, etc. If LLVM > proper were part of a larger repository that becomes more > difficult to do because the commit file paths won't match. We'd be > back to essentially manual diff+patch which is quite a step > backward from the smoth git-oriented process we use now. > > Can you clarify what you mean? Which part of the process would quite > manual patch that wouldn’t otherwise?If the monorepository is not using submodules but is instead a weaving of the histories of each component, that means each tree item pointing to a blob will have a different path. For example, lib/Target/X86/X86InstrInfo.cpp would become llvm/lib/Target/X86/X86InstrInfo.cpp or something similar. IME git doesn't deal well with applying changes to blobs that exist in different paths in the repository. That makes sense since the hashes directly depend on the information in the trees.> The document says that the individual git repositories will > remain. Does that mean the monorepository is using git-submodule > to manage the aggregate repository? > > First, have you read this document: https://reviews.llvm.org/D24167 ?Yes, though I was only able to figure out how to see an actual document by clicking "download raw diff." I'm not sure that's giving me the latest version. Is there another convenient way to view the document, preferable with the Markdown rendered? It's not completely clear to me how the monorepository would be created, and thus, how it would be structured. I understand each component gets its own subdirectory. I'm talking about how the underlying history is represented.> TLDR: The answer is no: you have to see it as it is today, i.e. a > single SVN repo containing all the sub-projects, and “exports” in > individual repositories.So the SVN version isn't using externals? I haven't ever looked at that repository. I didn't even know it existed until reading the document.> The same thing after: a single git repo containing all the subprojects > side-by-side and the *same* “exports” in individual repositories.How are those exports managed? Do you use a tool to filter the history for a directory in the monorepository and then export that to its own repository?> I completely understand the benefits of a monorepository. One of > the biggest for us was the ability to git-bisect across > components. How does git-bisect work with submodules? I have very > little experience with submodules but would like to learn more. > > > Fairly easy, the document mentions it in the examples.Ok, I probably skimmed that part since it wasn't directly related to describing how the repository would be structured. I'll go back and read it in more detail. Thanks! -David
Mehdi Amini via llvm-dev
2016-Sep-08 17:50 UTC
[llvm-dev] [RFC] One or many git repositories?
> On Sep 8, 2016, at 10:32 AM, dag at cray.com wrote: > > Mehdi Amini <mehdi.amini at apple.com> writes: > >> It could be problematic for us depending on how the monorepository >> is structured. We reference the LLVM git repository directly and >> use it to migrate to new versions, pick patches, etc. If LLVM >> proper were part of a larger repository that becomes more >> difficult to do because the commit file paths won't match. We'd be >> back to essentially manual diff+patch which is quite a step >> backward from the smoth git-oriented process we use now. >> >> Can you clarify what you mean? Which part of the process would quite >> manual patch that wouldn’t otherwise? > > If the monorepository is not using submodules but is instead a weaving > of the histories of each component, that means each tree item pointing > to a blob will have a different path. For example, > lib/Target/X86/X86InstrInfo.cpp would become > llvm/lib/Target/X86/X86InstrInfo.cpp or something similar. IME git > doesn't deal well with applying changes to blobs that exist in different > paths in the repository. That makes sense since the hashes directly > depend on the information in the trees. > >> The document says that the individual git repositories will >> remain. Does that mean the monorepository is using git-submodule >> to manage the aggregate repository? >> >> First, have you read this document: https://reviews.llvm.org/D24167 ? > > Yes, though I was only able to figure out how to see an actual document > by clicking "download raw diff." I'm not sure that's giving me the > latest version. Is there another convenient way to view the document, > preferable with the Markdown rendered?Sure, what about a PDF?> > It's not completely clear to me how the monorepository would be created, > and thus, how it would be structured. I understand each component gets > its own subdirectory. I'm talking about how the underlying history is > represented.Similarly as http://llvm.org/svn/llvm-project/> >> TLDR: The answer is no: you have to see it as it is today, i.e. a >> single SVN repo containing all the sub-projects, and “exports” in >> individual repositories. > > So the SVN version isn't using externals? I haven't ever looked at that > repository. I didn't even know it existed until reading the document.What I am referring here to is: http://llvm.org/svn/llvm-project/ <http://llvm.org/svn/llvm-project/> The SVN repo is a monorepo where the history of the subproject is “weaved”. And we are still able to export to individual git repositories. This is what would be the new monorepo, except that the source would be git instead of SVN, and we would continue to synchronize to http://llvm.org/git/llvm.git> >> The same thing after: a single git repo containing all the subprojects >> side-by-side and the *same* “exports” in individual repositories. > > How are those exports managed? Do you use a tool to filter the history > for a directory in the monorepository and then export that to its own > repository?Yes. There are multiple ways to do that actually. Conceptually, you can think about it as using `git diff` and `patch -p1` to take every commit to the monorepo and reapply them on the individual repo. The easiest way to achieve it though is probably the facility embedded in git itself: `git filter-branch --subdirectory-filter=llvm` https://git-scm.com/docs/git-filter-branch <https://git-scm.com/docs/git-filter-branch> Also, since GitHub offers an SVN access, you can view the monorepo offering the same SVN access as we have today. So the individual git repository can also be just `git svn` on a subdirectory of the SVN view of the monorepo on GitHub (I’m not sure this sentence is totally clear).>> I completely understand the benefits of a monorepository. One of >> the biggest for us was the ability to git-bisect across >> components. How does git-bisect work with submodules? I have very >> little experience with submodules but would like to learn more. >> >> >> Fairly easy, the document mentions it in the examples. > > Ok, I probably skimmed that part since it wasn't directly related to > describing how the repository would be structured. I'll go back and > read it in more detail.Do not hesitate if anything is unclear. — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160908/dfd36bd8/attachment-0002.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: GitHubMove.pdf Type: application/pdf Size: 221734 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160908/dfd36bd8/attachment-0001.pdf> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160908/dfd36bd8/attachment-0003.html>
Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> writes:> First, have you read this document: https://reviews.llvm.org/D24167 ? > > TLDR: The answer is no: you have to see it as it is today, i.e. a > single SVN repo containing all the sub-projects, and “exports” in > individual repositories.> The same thing after: a single git repo containing all the subprojects > side-by-side and the *same* “exports” in individual repositories.Sorry, I sent my earlier reply today before I intended to. After going back and reading the proposal again, I think I understand the plan. I haven't used the SVN repository for years so I was thinking in terms of git, that you'd take the existing git mirrors and combine them (visa submodule or some other mechanism). I understand now the proposal is to take the SVN root and export all of that as one giant git repository. Is that correct? If so, that raises a number of questions for me that aren't directly addressed in the document as far as I can see: 1. How are the individual component git mirrors going to be maintained? If a commit goes to the monorepository, what is going to extract the relevant bits and commit them to the individual mirrors? The document notes that with a monorepository a single commit can touch multiple projects (that's good!) but something has to extract the parts of that commit that are relevant to each subproject and then send those parts to the subproject repository. There are tools to do this and I think git-subtree is a good candidate [disclosure: I am the git-subtree maintainer] but I'm just curious what's being considered as a solution. 2. Is there any consideration for restructuring the directory layout? The document has this to say about checking out multiple components:> **Monorepo Proposal** > > The repository contains natively the source for every sub-projects at the right > revision, which makes this straightforward:: > > git clone https://github.com/llvm/llvm-projects.git llvm > cd llvm > git checkout $REVISION > > As before, at this point clang, llvm, and libcxx are stored in directories > alongside each other.The problem here is that for the build, clang wants to be in llvm/tools and other components want to be in other places. Should the monorepository just be structured to have everything in its correct place for building? My inclination is to say "no" because it reduces the visibility of the subprojects, but what are the alternatives? There are two that come to mind off the top of my head, 1) include symlinks in the repository or 2) change the build so all components can live at the top level. I think it's important to think about these kinds of questions because once a repository layout has been settled on, it's hard to change. Yes, it is relatively easy to move entire directories to new places in git, but that not only would require changes to whatever entity updates the subproject repositories, it's potentially a huge social issue, which are typically the most difficult problems to address. :) 3. How are the subproject repositories going to be created/migrated? The individual subproject repositories will have to be created from scratch after the monrepository is created, right? We can't just transition the existing git mirrors to the new setup, correct? A subproject repository reboot would involve some not insignificant pain for downstream users because their git histories are suddenly invalid. They would have to fetch a completely different repository and integrate it into whatever they have. If there is some way to maintain the existing git mirrors and layer new monorepository commits on top of the existing history that would be fantastic. I believe it is technically possible (I might need to add some enhancements to git-subtree :)) but I don't know if anyone has explored this. I would love to be told you all have the answers already. :) Bisecting For the multirepository proposal, the document talks about having the git-bisect run script update each submodule during bisection. I suppose that will work but the bisection would only report that the failure exists at a particular commit in the umbrella repository, implying a bunch of different commits, one for each subproject. It wouldn't really point to a particular subproject as being the culprit, correct? The document even hints at this: "it is possible that one commit in the umbrella repository includes multiple commits in the sub-projects" That's what I was getting at with my submodule bisect question. It can only bisect to a granularity of "one of these subprojects at their respective commits caused the problem." With a true monorepository bisect can drill down to the exact commit within a subproject or across multiple subprojects if the commit touched multiple subprojects. To me this is a giant advantage of a non-submodule-based monorepository, which I think is what the monorepository proposal is. If everything I've written here is generally correct, I think the monorepository will work for us, as long as each subproject repository is maintained at a granularity of one subproject commit per commit to the corresponding directory in the monorepository (i.e. full history is maintained). Thanks for you work on this. This kind of work is crucially important but often unrecognized and underappreciated. -David
Mehdi Amini via llvm-dev
2016-Sep-08 18:37 UTC
[llvm-dev] [RFC] One or many git repositories?
Sent from my iPhone> On Sep 8, 2016, at 11:08 AM, dag at cray.com wrote: > > Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> writes: > >> First, have you read this document: https://reviews.llvm.org/D24167 ? >> >> TLDR: The answer is no: you have to see it as it is today, i.e. a >> single SVN repo containing all the sub-projects, and “exports” in >> individual repositories. > >> The same thing after: a single git repo containing all the subprojects >> side-by-side and the *same* “exports” in individual repositories. > > Sorry, I sent my earlier reply today before I intended to. > > After going back and reading the proposal again, I think I understand > the plan. I haven't used the SVN repository for years so I was thinking > in terms of git, that you'd take the existing git mirrors and combine > them (visa submodule or some other mechanism). I understand now the > proposal is to take the SVN root and export all of that as one giant git > repository. Is that correct? >Yes> If so, that raises a number of questions for me that aren't directly > addressed in the document as far as I can see: > > 1. How are the individual component git mirrors going to be maintained?Just exactly as they are today.> > If a commit goes to the monorepository, what is going to extract the > relevant bits and commit them to the individual mirrors? The document > notes that with a monorepository a single commit can touch multiple > projects (that's good!) but something has to extract the parts of that > commit that are relevant to each subproject and then send those parts to > the subproject repository.Right, but note that it is already the case today, some people are already using SVN to commit to clang and LLVM at the same time, and the same commit in SVN will result in one commit in the llvm git repo and another commit in the clang repo.> There are tools to do this and I think > git-subtree is a good candidate [disclosure: I am the git-subtree > maintainer] but I'm just curious what's being considered as a solution.Well we haven't decided on anything for the official mirrors. It looks like you're in a good position to help designing how subtree could help here :) (I have a fairly good understanding of git, but very limited knowledge of subtree) Anyway I hope will be able to put scripts in the repo so that anyone downstream can split the repo independently of official mirrors.> > 2. Is there any consideration for restructuring the directory layout? > > The document has this to say about checking out multiple components: > >> **Monorepo Proposal** >> >> The repository contains natively the source for every sub-projects at the right >> revision, which makes this straightforward:: >> >> git clone https://github.com/llvm/llvm-projects.git llvm >> cd llvm >> git checkout $REVISION >> >> As before, at this point clang, llvm, and libcxx are stored in directories >> alongside each other. > > The problem here is that for the build, clang wants to be in llvm/tools > and other components want to be in other places.Not exactly: cmake has magic discovery when clang is in tools, but it is not a requirement. You can do (for years): cmake -DLLVM_EXTERNAL_CLANG_SOURCE_DIR=path> Should the > monorepository just be structured to have everything in its correct > place for building? My inclination is to say "no" because it reduces > the visibility of the subprojects, but what are the alternatives? There > are two that come to mind off the top of my head, 1) include symlinks in > the repository or 2) change the build so all components can live at the > top level.I'd expect a cmake shortcut cmake -DLLVM_ENABLE_PROjECTS=clang,libcxx,compiler-rt> > I think it's important to think about these kinds of questions because > once a repository layout has been settled on, it's hard to change. Yes, > it is relatively easy to move entire directories to new places in git, > but that not only would require changes to whatever entity updates the > subproject repositories, it's potentially a huge social issue, which are > typically the most difficult problems to address. :) > > 3. How are the subproject repositories going to be created/migrated? > > The individual subproject repositories will have to be created from > scratch after the monrepository is created, right? We can't just > transition the existing git mirrors to the new setup, correct?It depends: there are tradeof for each option and I think we need to gather community inputs to settle on one.> A > subproject repository reboot would involve some not insignificant pain > for downstream users because their git histories are suddenly invalid. > They would have to fetch a completely different repository and integrate > it into whatever they have.If we "reboot" the official git mirrors, I expect We'd provide scripts for integrating from the new monorepo on top of the existing history. Ultimately these mirrors are "facilities" but it shouldn't be significantly harder for downstream to integrate directly from the monorepo with a bit of scripting, and I suspect this scripting is likely to be shareable and committed upstream.> > If there is some way to maintain the existing git mirrors and layer new > monorepository commits on top of the existing history that would be > fantastic. I believe it is technically possible (I might need to add > some enhancements to git-subtree :)) but I don't know if anyone has > explored this. I would love to be told you all have the answers > already. :) > > Bisecting > > For the multirepository proposal, the document talks about having the > git-bisect run script update each submodule during bisection. I suppose > that will work but the bisection would only report that the failure > exists at a particular commit in the umbrella repository, implying a > bunch of different commits, one for each subproject. It wouldn't really > point to a particular subproject as being the culprit, correct?Yes, it depends on the frequency of the update of the umbrella.> The > document even hints at this: "it is possible that one commit in the > umbrella repository includes multiple commits in the sub-projects" > > That's what I was getting at with my submodule bisect question. It can > only bisect to a granularity of "one of these subprojects at their > respective commits caused the problem." With a true monorepository > bisect can drill down to the exact commit within a subproject or across > multiple subprojects if the commit touched multiple subprojects. To me > this is a giant advantage of a non-submodule-based monorepository, which > I think is what the monorepository proposal is. > > If everything I've written here is generally correct, I think the > monorepository will work for us, as long as each subproject repository > is maintained at a granularity of one subproject commit per commit to > the corresponding directory in the monorepository (i.e. full history is > maintained). > > Thanks for you work on this. This kind of work is crucially important > but often unrecognized and underappreciated. >Thanks :) If you have any input on parts of the document that can be made more clear, feel free to chime in in the review. -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160908/83f13591/attachment.html>