Oliver Schneider
2015-Feb-27 10:27 UTC
[LLVMdev] SVN dump seed file (was: svnsync of llvm tree)
Hi folks, in a rather old thread on this list titled "svnsync of llvm tree" <http://comments.gmane.org/gmane.comp.compilers.llvm.devel/42523> we noticed that an svnsync would fail due to a few particularly big commits that apparently caused OOM conditions on the server. The error and the revision number were consistent for different people. That seems to be fixed now. I succeeded in pulling a full "clone" of the SVN repository. Back then it was noted that a SVN dump seed file could make it easier for people to start svnsync-ing the LLVM source tree. Now, I'm hoping to convince you to provide a seed file for another reason than just the error from back then. To my knowledge when using svnsync (just like svn one actually transfers approximately what the svndump file size of the resulting repo. Uncompressed that happens to be almost 15 Gigabytes, even though the resulting SVN repository is only roughly 3.8 Gigabytes in size. For this experiment I dumped the revision range 0:230000 to a file just to have a "clean" range. After compressing the dump with the lzma utility I had a file of 419 Megabytes. If there is interest, I can upload the compressed dump (or make it available for download), including a PGP signature on the files, and so on for you to make available on the official servers. I'll even add detailed steps on how to get to a repository again from there and how to keep it up-to-date. And yes, I adjusted the repo UUID to match the remote one (which is utterly useful to keep a local synchronized repo and 'svn relocate' between upstream and the synchronized one depending on availability or mood). All I really did was: svnadmin dump $(pwd) -r 0:230000 > llvm.svndump from my svnsync "clone" with the already adjusted UUID. And then "lzma -k9e" the resulting dump file (the compression took more than 2 hours). The main point is, that anyone trying to start synchronizing now will have to transfer ~15 GiB of data to get to the current point. That can be cut to ~420 MiB by providing a seed file, in the described case for the revision range 0:230000 (and additional chunks such as 230000:250000 could be added later, or the base seed file could be updated accordingly). Hope someone in charge reads and considers this. With best regards, Oliver PS: feel free to contact me off-list about that, too.
David Chisnall
2015-Feb-27 10:41 UTC
[LLVMdev] SVN dump seed file (was: svnsync of llvm tree)
Hi, I think it would be easier to understand why you want this if you had a use case for having an svnsync clone. Aside from backing up the repository, it seems like a fairly useless thing: you can't do local commits and then upstream them and you can't do If you want the complete history of the repository, then a git clone of the git-svn mirror will give you this very cheaply and with the added bonus that you can then commit to the local copy and still push things upstream (and merge changes from upstream). A fresh clone of the llvm and clang git mirrors transfers about 310MB for LLVM and about 190MB for Clang. What do you want to do with the svnsync copy? David> On 27 Feb 2015, at 10:27, Oliver Schneider <llvm at assarbad.net> wrote: > > Hi folks, > > in a rather old thread on this list titled "svnsync of llvm tree" > <http://comments.gmane.org/gmane.comp.compilers.llvm.devel/42523> we > noticed that an svnsync would fail due to a few particularly big commits > that apparently caused OOM conditions on the server. The error and the > revision number were consistent for different people. > > That seems to be fixed now. I succeeded in pulling a full "clone" of the > SVN repository. > > Back then it was noted that a SVN dump seed file could make it easier > for people to start svnsync-ing the LLVM source tree. Now, I'm hoping to > convince you to provide a seed file for another reason than just the > error from back then. > > To my knowledge when using svnsync (just like svn one actually > transfers approximately what the svndump file size of the resulting > repo. Uncompressed that happens to be almost 15 Gigabytes, even though > the resulting SVN repository is only roughly 3.8 Gigabytes in size. For > this experiment I dumped the revision range 0:230000 to a file just to > have a "clean" range. > > After compressing the dump with the lzma utility I had a file of 419 > Megabytes. > > If there is interest, I can upload the compressed dump (or make it > available for download), including a PGP signature on the files, and so > on for you to make available on the official servers. I'll even add > detailed steps on how to get to a repository again from there and how to > keep it up-to-date. And yes, I adjusted the repo UUID to match the > remote one (which is utterly useful to keep a local synchronized repo > and 'svn relocate' between upstream and the synchronized one depending > on availability or mood). > > All I really did was: > > svnadmin dump $(pwd) -r 0:230000 > llvm.svndump > > from my svnsync "clone" with the already adjusted UUID. And then "lzma > -k9e" the resulting dump file (the compression took more than 2 hours). > > The main point is, that anyone trying to start synchronizing now will > have to transfer ~15 GiB of data to get to the current point. That can > be cut to ~420 MiB by providing a seed file, in the described case for > the revision range 0:230000 (and additional chunks such as 230000:250000 > could be added later, or the base seed file could be updated accordingly). > > Hope someone in charge reads and considers this. > > With best regards, > > Oliver > > PS: feel free to contact me off-list about that, too. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi, On 2015-02-27 10:41, David Chisnall wrote:> I think it would be easier to understand why you want this if you had > a use case for having an svnsync clone. Aside from backing up the > repository, it seems like a fairly useless thing: you can't do local > commits and then upstream them and you can't dowell more or less for backup purposes, yes. Local commits are not necessary, say, if I rarely commit. Because I could 'svn relocate' my working copy to point to the upstream repo before committing. And to reinforce that I really don't want to commit to my local "clone" I can install a hook script preventing me from doing that. A very realistic use case is that people in a big organization could use the same local copy to checkout and the upstream repo to commit. SVN over HTTP isn't exactly the most efficient protocol there is, so if I can skip thousands of revisions via SVN over HTTP and get going more quickly, that helps a lot.> If you want the complete history of the repository, then a git clone > of the git-svn mirror will give you this very cheaply and with the > added bonus that you can then commit to the local copy and still push > things upstream (and merge changes from upstream). A fresh clone of > the llvm and clang git mirrors transfers about 310MB for LLVM and > about 190MB for Clang.And the desire for getting that in the SVN form is mainly coming from the imperfect representation of the history in the git-svn mirror (as you will also find from some recent and older threads). Besides, the llvm-project SVN repo contains *everything* whereas the Git mirrors provide only slices of a handful as far as I know. I know git-svn kind of works, but it's a crutch. But of course it provides the means for cooperation between SVN and Git users, so that aspect is good.> What do you want to do with the svnsync copy?Personally I wanted to keep an updated SVN copy and work on providing a better (continuously updated) Git and Mercurial representation of the repository (or repositories, need to look into that). I have gathered experience with repository conversion (using reposurgeon) and thought I could apply it to something more complex, too. I also learned that there are quite a few "Subversionisms" that can cause problems in the conversion. Likely one of the reasons why the git-svn mirrors are not quite perfect. This could be quite useful if and when LLVM decides to move to any other version control system that supports git-fast-import streams. For this, of course, one needs a true-to-the-bit copy of the repo and not something like the imperfect git-svn mirror :) Also within our company I'd like to use it in the same way I described above (it's not the first repo I do this with, but it's one of the bigger ones). On the local network the transfer even of large amounts of data is a lot faster. I cannot say I ever enjoyed the speed of SVN much, but at least on the local net it becomes less of a nuisance. Of course it's easy to put me on the spot about use *cases* when I only have one or two for myself, but there are probably more use cases out there due to some of the inherent weaknesses of SVN compared to distributed version control systems. The mailing list thread back then proved there was demand for it. Creating a dump is a one-time effort. It can (but doesn't have to be) redone every few ten thousand revisions to provide additional chunks on top of the initial seed file or even create a fresh base seed file. And it saves traffic. I cannot offer more reasons than that, sorry. Perhaps someone else can. Oliver