Richard W.M. Jones
2018-Feb-09 18:01 UTC
[Libguestfs] 1.39 proposal: Let's split up the libguestfs git repo and tarballs
My contention is that the libguestfs git repository is too large and unwieldy. There are too many separate, unrelated projects and as a result of that the source has too many dependencies and takes too long to build and test. The project divides (sort of) naturally into layers -- the library, the bindings, the various virt tools -- and could be split along those lines into separate projects which can then be released and evolve at their own pace. My suggested split would be something like this: * libguestfs: The library, daemon and appliance. That would include the following directories in a single project: appliance bash contrib daemon docs examples gnulib lib logo test-tool tmp utils website * 1 project for each language binding: csharp erlang gobject golang haskell java lua ocaml php perl python ruby * virt-customize and related tools, we'd probably call this subproject "virt-builder". It would include virt-builder, virt-customize and virt-sysprep, since they share a lot of common code. * 1 project for each of the following items: small tools written in C (virt-cat, virt-filesystems, virt-log, virt-ls, virt-tail, virt-diff, virt-edit, virt-format, guestmount, virt-inspector, virt-make-fs, virt-rescue) guestfish virt-alignment-scan and virt-df virt-dib virt-get-kernel virt-resize virt-sparsify virt-v2v and virt-p2v virt-win-reg * I'd be inclined to drop the legacy Perl tools virt-tar, virt-list-filesystems, virt-list-partitions unless someone especially wished to step forward to maintain them. * common code and generator: Off to the side we'd somehow need to package up the common code and the generator for use by all of the above projects. It wouldn't be a separate project for downstream packagers, but instead the code would be included (ie. duplicated) in tarballs and upstream available as a side git repo that you'd need to include when building (git submodule?). This is somewhat unspecified. M4, PO, and tests would be split between the projects as appropriate. My proposal would be to do this incrementally, rather than all at once, moving the easier things out first. Thoughts? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Eric Blake
2018-Feb-09 19:07 UTC
Re: [Libguestfs] 1.39 proposal: Let's split up the libguestfs git repo and tarballs
On 02/09/2018 12:01 PM, Richard W.M. Jones wrote:> My contention is that the libguestfs git repository is too large and > unwieldy. There are too many separate, unrelated projects and as a > result of that the source has too many dependencies and takes too long > to build and test. > > The project divides (sort of) naturally into layers -- the library, > the bindings, the various virt tools -- and could be split along those > lines into separate projects which can then be released and evolve at > their own pace.Sounds reasonable to me as an observer. Would you also create a meta-package that has all the other projects as submodules, and which gets a new commit any time any one of the submodules does a release, to still make it easy for someone who wants to grab everything that the old monolithic repo used to provide?> * common code and generator: Off to the side we'd somehow need to > package up the common code and the generator for use by all of the > above projects. It wouldn't be a separate project for downstream > packagers, but instead the code would be included (ie. duplicated) > in tarballs and upstream available as a side git repo that you'd > need to include when building (git submodule?). This is somewhat > unspecified.git submodules are a pain to work with sometimes, but they do sound like the best approach for what you are documenting here. Dan Berrange's work on making keycodemapdb a submodule to multiple projects may prove to be a useful inspiration in the process. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2018-Feb-09 19:35 UTC
Re: [Libguestfs] 1.39 proposal: Let's split up the libguestfs git repo and tarballs
On Fri, Feb 09, 2018 at 01:07:13PM -0600, Eric Blake wrote:> On 02/09/2018 12:01 PM, Richard W.M. Jones wrote: > >My contention is that the libguestfs git repository is too large and > >unwieldy. There are too many separate, unrelated projects and as a > >result of that the source has too many dependencies and takes too long > >to build and test. > > > >The project divides (sort of) naturally into layers -- the library, > >the bindings, the various virt tools -- and could be split along those > >lines into separate projects which can then be released and evolve at > >their own pace. > > Sounds reasonable to me as an observer. Would you also create a > meta-package that has all the other projects as submodules, and > which gets a new commit any time any one of the submodules does a > release, to still make it easy for someone who wants to grab > everything that the old monolithic repo used to provide?I guess we could although it has a danger of getting out of date if no one works on it.> >* common code and generator: Off to the side we'd somehow need to > > package up the common code and the generator for use by all of the > > above projects. It wouldn't be a separate project for downstream > > packagers, but instead the code would be included (ie. duplicated) > > in tarballs and upstream available as a side git repo that you'd > > need to include when building (git submodule?). This is somewhat > > unspecified. > > git submodules are a pain to work with sometimes, but they do sound > like the best approach for what you are documenting here. Dan > Berrange's work on making keycodemapdb a submodule to multiple > projects may prove to be a useful inspiration in the process.I'm not a fan of submodules either, but in this one case I do think they would work. It's still an open question how this would translate to tarballs which realistically need to be self-contained. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Daniel P. Berrangé
2018-Feb-12 09:22 UTC
Re: [Libguestfs] 1.39 proposal: Let's split up the libguestfs git repo and tarballs
On Fri, Feb 09, 2018 at 06:01:53PM +0000, Richard W.M. Jones wrote:> My contention is that the libguestfs git repository is too large and > unwieldy. There are too many separate, unrelated projects and as a > result of that the source has too many dependencies and takes too long > to build and test. > > The project divides (sort of) naturally into layers -- the library, > the bindings, the various virt tools -- and could be split along those > lines into separate projects which can then be released and evolve at > their own pace. > > My suggested split would be something like this: > > * libguestfs: The library, daemon and appliance. That would include > the following directories in a single project: > appliance > bash > contrib > daemon > docs > examples > gnulib > lib > logo > test-tool > tmp > utils > website > > * 1 project for each language binding: > csharp > erlang > gobject > golang > haskell > java > lua > ocaml > php > perl > python > rubySo, the core library above would still include the API description and "make install" it into some location, such that these language bindings cna auto-generate themselves I presume. I guess that means you would rarely need to do releases of these language bindings, as one release ought to be capable of being built against multiple versions of the core library ?> * virt-customize and related tools, we'd probably call this subproject > "virt-builder". It would include virt-builder, virt-customize and > virt-sysprep, since they share a lot of common code.Makes sense to have virt-builder as a thing in its own right.> > * 1 project for each of the following items: > > small tools written in C > (virt-cat, virt-filesystems, virt-log, virt-ls, virt-tail, > virt-diff, virt-edit, virt-format, guestmount, virt-inspector, > virt-make-fs, virt-rescue) > > guestfish > > virt-alignment-scan and virt-df > > virt-dib > > virt-get-kernel > > virt-resize > > virt-sparsify > > virt-v2v and virt-p2v > > virt-win-regOk> > * I'd be inclined to drop the legacy Perl tools virt-tar, > virt-list-filesystems, virt-list-partitions unless someone > especially wished to step forward to maintain them. > > * common code and generator: Off to the side we'd somehow need to > package up the common code and the generator for use by all of the > above projects. It wouldn't be a separate project for downstream > packagers, but instead the code would be included (ie. duplicated) > in tarballs and upstream available as a side git repo that you'd > need to include when building (git submodule?). This is somewhat > unspecified.I guess sub-modules are reasonable for this, unless you actually modulized the generator itself, such that the language binding generation code could be a loadable module. That way the core generator could be in the core library (and its -devel) package, and the language binding repo could have the langauge specific plugin for the generator ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Richard W.M. Jones
2018-Feb-12 12:09 UTC
Re: [Libguestfs] 1.39 proposal: Let's split up the libguestfs git repo and tarballs
On Mon, Feb 12, 2018 at 09:22:30AM +0000, Daniel P. Berrangé wrote:> On Fri, Feb 09, 2018 at 06:01:53PM +0000, Richard W.M. Jones wrote: > > My contention is that the libguestfs git repository is too large and > > unwieldy. There are too many separate, unrelated projects and as a > > result of that the source has too many dependencies and takes too long > > to build and test. > > > > The project divides (sort of) naturally into layers -- the library, > > the bindings, the various virt tools -- and could be split along those > > lines into separate projects which can then be released and evolve at > > their own pace. > > > > My suggested split would be something like this: > > > > * libguestfs: The library, daemon and appliance. That would include > > the following directories in a single project: > > appliance > > bash > > contrib > > daemon > > docs > > examples > > gnulib > > lib > > logo > > test-tool > > tmp > > utils > > website > > > > * 1 project for each language binding: > > csharp > > erlang > > gobject > > golang > > haskell > > java > > lua > > ocaml > > php > > perl > > python > > ruby > > So, the core library above would still include the API description > and "make install" it into some location, such that these language > bindings cna auto-generate themselves I presume. I guess that means > you would rarely need to do releases of these language bindings, as > one release ought to be capable of being built against multiple > versions of the core library ?Certainly the language bindings are the hardest to deal with but also the most important to move out in terms of reducing dependencies. The "API description" continues to be the generator, turned into a git submodule and shared across all of them. But it's not a fully formed plan. One particular difficulty is - as you note - that some of the bindings cannot be compiled against a different version of libguestfs (we discovered this when we turned the Python bindings into a PyPi module), so likely they'd all need to be released at the same time, or else modified so at least they need a minimum version of libguestfs.> > * I'd be inclined to drop the legacy Perl tools virt-tar, > > virt-list-filesystems, virt-list-partitions unless someone > > especially wished to step forward to maintain them. > > > > * common code and generator: Off to the side we'd somehow need to > > package up the common code and the generator for use by all of the > > above projects. It wouldn't be a separate project for downstream > > packagers, but instead the code would be included (ie. duplicated) > > in tarballs and upstream available as a side git repo that you'd > > need to include when building (git submodule?). This is somewhat > > unspecified. > > I guess sub-modules are reasonable for this, unless you actually > modulized the generator itself, such that the language binding > generation code could be a loadable module. That way the core > generator could be in the core library (and its -devel) package, > and the language binding repo could have the langauge specific > plugin for the generator ?We can keep the generator as a single program with only a small modification (it needs to check if a directory exists before putting files there). How exactly this all works when compiling from tarballs is also not clear. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Pino Toscano
2019-Apr-30 16:28 UTC
Re: [Libguestfs] 1.39 proposal: Let's split up the libguestfs git repo and tarballs
On Friday, 9 February 2018 19:01:53 CEST Richard W.M. Jones wrote:> My contention is that the libguestfs git repository is too large and > unwieldy. There are too many separate, unrelated projects and as a > result of that the source has too many dependencies and takes too long > to build and test. > > The project divides (sort of) naturally into layers -- the library, > the bindings, the various virt tools -- and could be split along those > lines into separate projects which can then be released and evolve at > their own pace.As also other answers to this email say, splitting tools, and bindings may be very complex, and thus for now it is still a too far goal. However...> My suggested split would be something like this: > > [...] > virt-v2v and virt-p2vI'd rather split virt-p2v in its own repository. There are various reasons for this: - it does not use libguestfs (the library), just the tools for testing stuff - the communication with virt-v2v is done via network, and its capabilities are dynamically probed (so theoretically virt-p2v, and virt-v2v can be used even when their versions are odd) - it is written only in C However, even if it looks simple, in reality there are number of common things used from the rest of the libguestfs tree: 1) gnulib 2) some build system bits (e.g. m4/guestfs-v2v.m4) 3) auto-cleanup bits (e.g. CLEANUP_FREE), although only few are used (CLEANUP_FREE, CLEANUP_FREE_STRING_LIST, CLEANUP_PCLOSE, CLEANUP_FCLOSE, and CLEANUP_XMLFREETEXTWRITER) 4) other internal macros, i.e. guestfs-utils.h 5) the list of credits generated by the generator (i.e. generator/authors.ml) 6) the p2v configuration generated by the generator (i.e. generator/p2v_config.ml) 7) test images/data (phony images, and virt-tools) 8) the miniexpect module, right now out of the p2v subdirectory Possible solutions may/might be: 1) add own submodule (use its own set of modules) 2) copy/implement them them locally: luckly they are not many, so inlining them in configure.ac will not be a problem; the common bits (e.g. the distro detection from os-release) can be split in its own module in libguestfs, copying it in p2v 3/4) have a local version of them; not pretty, although they are not that many 5) this list is reflected in two places: the p2v/about-authors.c file, and the AUTHORS file (theoretically mandatory for automake, unless "foreign" is used, which it is); my idea was to go back to a manually written about-authors.c file without the libguestfs credits, leaving the few p2v ones easy to manage; the same for the AUTHORS file 6) this is a bit more complex: my idea was to keep it as OCaml script to run at build time, instead of being statically shipped at dist time 7) create their own versions at test time using guestfish/virt-builder; maybe use a fedora image, instead of a phony windows one (will avoid hivex for the tests) 8) The other problem is how to split the repository, as the various bits are in different places: a) git filter-branch --subdirectory-filter p2v + very small repo with the current p2v subdirectory + preserves the history of the p2v subdirectory, with branches and tags - missing all the other bits, which will have no history - not usable to build older releases (e.g. for bisecting) b) create a work branch in libguestfs, then in that branch move/copy all the stuff making the p2v subdirectory build standalone there, and then import the content of the p2v subdirectory of that branch in a new empty repo + very small repo with the current p2v subdirectory - no history, no tags nor branches + using a graft it is possible to "stitch" the history of the new repo with the work branch in libguestfs c) git filter-branch to remove all the bits not related to p2v from all the commits + not that big repo + preserves the history of all the content, with branches and tags - will take a very long time to create (e.g. iterate over and over to find out what to remove) - not usable to build older releases (e.g. for bisecting) Thanks, -- Pino Toscano
Richard W.M. Jones
2019-Jun-10 15:35 UTC
Re: [Libguestfs] 1.39 proposal: Let's split up the libguestfs git repo and tarballs
Sorry for the late reply to this ... On Tue, Apr 30, 2019 at 06:28:01PM +0200, Pino Toscano wrote:> On Friday, 9 February 2018 19:01:53 CEST Richard W.M. Jones wrote: > > My contention is that the libguestfs git repository is too large and > > unwieldy. There are too many separate, unrelated projects and as a > > result of that the source has too many dependencies and takes too long > > to build and test. > > > > The project divides (sort of) naturally into layers -- the library, > > the bindings, the various virt tools -- and could be split along those > > lines into separate projects which can then be released and evolve at > > their own pace. > > As also other answers to this email say, splitting tools, and bindings > may be very complex, and thus for now it is still a too far goal. > > However... > > > My suggested split would be something like this: > > > > [...] > > virt-v2v and virt-p2v > > I'd rather split virt-p2v in its own repository. There are various > reasons for this: > - it does not use libguestfs (the library), just the tools for testing > stuff > - the communication with virt-v2v is done via network, and its > capabilities are dynamically probed (so theoretically virt-p2v, and > virt-v2v can be used even when their versions are odd) > - it is written only in C > > However, even if it looks simple, in reality there are number of common > things used from the rest of the libguestfs tree: > 1) gnulibWe hardly use gnulib in virt-p2v. I think it's only used for ignore-value.h, getprogname.h, and c-ctype.h, all of which are likely to be easily worked around.> 2) some build system bits (e.g. m4/guestfs-v2v.m4)Right, although this in itself should be split up, so no bad thing.> 3) auto-cleanup bits (e.g. CLEANUP_FREE), although only few are used > (CLEANUP_FREE, CLEANUP_FREE_STRING_LIST, CLEANUP_PCLOSE, > CLEANUP_FCLOSE, and CLEANUP_XMLFREETEXTWRITER) > 4) other internal macros, i.e. guestfs-utils.hCommon code is a bit tricker, as is ...> 5) the list of credits generated by the generator > (i.e. generator/authors.ml) > 6) the p2v configuration generated by the generator > (i.e. generator/p2v_config.ml)... the generator and ...> 7) test images/data (phony images, and virt-tools)test data.> 8) the miniexpect module, right now out of the p2v subdirectoryThis is only used by virt-p2v I think, so it could go with virt-p2v or be made into a separate project.> Possible solutions may/might be: > 1) add own submodule (use its own set of modules)I think we should ditch gnulib as much as possible, so see above.> 2) copy/implement them them locally: luckly they are not many, so > inlining them in configure.ac will not be a problem; the common > bits (e.g. the distro detection from os-release) can be split in > its own module in libguestfs, copying it in p2v > 3/4) have a local version of them; not pretty, although they are not > that many > 5) this list is reflected in two places: the p2v/about-authors.c file, > and the AUTHORS file (theoretically mandatory for automake, unless > "foreign" is used, which it is); my idea was to go back to a manually > written about-authors.c file without the libguestfs credits, leaving > the few p2v ones easy to manage; the same for the AUTHORS file > 6) this is a bit more complex: my idea was to keep it as OCaml script > to run at build time, instead of being statically shipped at dist > time > 7) create their own versions at test time using guestfish/virt-builder; > maybe use a fedora image, instead of a phony windows one (will avoid > hivex for the tests) > 8)So while I'm not a massive fan of git submodules, now that I have used them a few times with riscv stuff, they do solve a certain problem as long as they are managed carefully. I think the common code and the generator are cases where a submodule or two would work. Does this mean we need to move immediately to a submodule if just splitting virt-p2v, or copy code as you suggest? Maybe not, because you can imagine for just this project copying the code needed from the common/ directory, and creating a new "mini-generator" for the project which handles the little bits that need to be generated in virt-p2v. However in the long term if we split up everything a submodule or two does seem to make sense, so maybe we should start there?> The other problem is how to split the repository, as the various bits > are in different places: > a) git filter-branch --subdirectory-filter p2v > + very small repo with the current p2v subdirectory > + preserves the history of the p2v subdirectory, with branches and tags > - missing all the other bits, which will have no history > - not usable to build older releases (e.g. for bisecting)I'm not exactly sure what this does. Is this something to do with preserving the history? TBH I don't think we need to bother with the history -- it exists still in libguestfs.git.> b) create a work branch in libguestfs, then in that branch move/copy all > the stuff making the p2v subdirectory build standalone there, and then > import the content of the p2v subdirectory of that branch in a new empty > repo > + very small repo with the current p2v subdirectory > - no history, no tags nor branches > + using a graft it is possible to "stitch" the history of the new repo > with the work branch in libguestfs > > c) git filter-branch to remove all the bits not related to p2v from all > the commits > + not that big repo > + preserves the history of all the content, with branches and tags > - will take a very long time to create (e.g. iterate over and over to > find out what to remove) > - not usable to build older releases (e.g. for bisecting)Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Possibly Parallel Threads
- 1.39 proposal: Let's split up the libguestfs git repo and tarballs
- Re: 1.39 proposal: Let's split up the libguestfs git repo and tarballs
- Re: 1.39 proposal: Let's split up the libguestfs git repo and tarballs
- Re: 1.39 proposal: Let's split up the libguestfs git repo and tarballs
- Re: Splitting the large libguestfs repo