Laszlo Ersek
2021-Oct-01 10:35 UTC
[Libguestfs] translating CD-ROM device paths from i440fx to Q35 in virt-v2v
Hi Rich, (dropping libvirt-devel) On 09/30/21 13:53, Richard W.M. Jones wrote:> Also we don't currently try to find or rewrite /dev/disk/ paths in > guest configuration files. The only rewriting that happens is for > /dev/[hs]d* block device filenames and a few others. The actual code > that does this is convert/convert_linux.ml:remap_block_devices > > So I wouldn't over-think this. It's likely fine to identify such > devices and rewrite them as "/dev/cdrom", assuming that (I didn't > check) udev creates that symlink for any reasonably modern Linux. And > if there's more than one attached CD to the source, only convert the > first one and warn about but drop the others.So I've started reading "convert/convert_linux.ml" in parallel with the OCaml manual. (I dabbled for a few weeks in Haskell when everyone else did a few years ago, so it's not 100% unfamiliar.) Question: let family match inspect.i_distro with | "fedora" | "rhel" | "centos" | "scientificlinux" | "redhat-based" | "oraclelinux" -> `RHEL_family | "altlinux" -> `ALT_family | "sles" | "suse-based" | "opensuse" -> `SUSE_family | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> `Debian_family Here, RHEL_family, ALT_family, SUSE_family, Debian_family are not plain constructors ("fixed" variants) [1] but polymorphic ones [2]. Why? Was this done *only* in order so that an explicit type definition such as type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family could be avoided? Based on my (incomplete understanding) of the ocaml docs, this looks like a bad idea. We need no polymorphism here, and using a static type (a fixed variant) is generally beneficial [3]. The following (minimally modified) definition at the OCaml REPL: let family1 distro match distro with | "fedora" | "rhel" | "centos" | "scientificlinux" | "redhat-based" | "oraclelinux" -> `RHEL_family | "altlinux" -> `ALT_family | "sles" | "suse-based" | "opensuse" -> `SUSE_family | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> `Debian_family | _ -> assert false;; deduces the following type: string -> [> `ALT_family | `Debian_family | `RHEL_family | `SUSE_family ] = <fun> with the nasty "[>" mark at the start (allowing for further type refinement [2], which I think we don't need here). Conversely. type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family;; let family2 distro match distro with | "fedora" | "rhel" | "centos" | "scientificlinux" | "redhat-based" | "oraclelinux" -> RHEL_family | "altlinux" -> ALT_family | "sles" | "suse-based" | "opensuse" -> SUSE_family | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> Debian_family | _ -> assert false;; (note the explicit os_family type definition, and the removal of the backticks from the constructor names) comes back with the type: val family2 : string -> os_family = <fun> Basically converting a string to an enum constant. So, what's the reason for the polymorphic variant? [1] https://ocaml.org/manual/coreexamples.html#s%3Atut-recvariants [2] https://ocaml.org/manual/polyvariant.html#sec48 [3] https://ocaml.org/manual/polyvariant.html#s%3Apolyvariant-weaknesses Thanks! Laszlo
Richard W.M. Jones
2021-Oct-01 11:03 UTC
[Libguestfs] translating CD-ROM device paths from i440fx to Q35 in virt-v2v
On Fri, Oct 01, 2021 at 12:35:50PM +0200, Laszlo Ersek wrote:> Hi Rich, > > (dropping libvirt-devel) > > On 09/30/21 13:53, Richard W.M. Jones wrote: > > > Also we don't currently try to find or rewrite /dev/disk/ paths in > > guest configuration files. The only rewriting that happens is for > > /dev/[hs]d* block device filenames and a few others. The actual code > > that does this is convert/convert_linux.ml:remap_block_devices > > > > So I wouldn't over-think this. It's likely fine to identify such > > devices and rewrite them as "/dev/cdrom", assuming that (I didn't > > check) udev creates that symlink for any reasonably modern Linux. And > > if there's more than one attached CD to the source, only convert the > > first one and warn about but drop the others. > > So I've started reading "convert/convert_linux.ml" in parallel with the > OCaml manual. (I dabbled for a few weeks in Haskell when everyone else > did a few years ago, so it's not 100% unfamiliar.) > > Question: > > let family > match inspect.i_distro with > | "fedora" > | "rhel" | "centos" | "scientificlinux" | "redhat-based" > | "oraclelinux" -> `RHEL_family > | "altlinux" -> `ALT_family > | "sles" | "suse-based" | "opensuse" -> `SUSE_family > | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> `Debian_family > > Here, RHEL_family, ALT_family, SUSE_family, Debian_family are not plain > constructors ("fixed" variants) [1] but polymorphic ones [2]. > > Why? > > Was this done *only* in order so that an explicit type definition such > as > > type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family > > could be avoided?There are two types of variants in OCaml. The first is the normal one and it requires a type definition. eg: type t = A | B | C of int | D of string * int with values like ?A? or ?C 3? or ?D ("hello", 1)?. These are ones we'd normally use and have the (arguable?) advantage that you have to write (and therefore, document) the full type somewhere. The second type are polymorphic variants. These are actually way more complicated than I'm going to describe below because they have a load of set theory behind them (if you really want to know see the docs that you've already linked to). They are written with backtick before the name of the variant. The brief idea is that you let the OCaml compiler work out the type for you. For example: let f = function 1 -> `A | 2 -> `B | i -> `C i creates a function f which returns the type: type pv = [ `A | `B | `C of int ] Actually if you type this into the toplevel it says: $ rlwrap ocaml OCaml version 4.12.0 # let f = function 1 -> `A | 2 -> `B | i -> `C i ;; val f : int -> [> `A | `B | `C of int ] = <fun> As you observe below there's an extra ">" character. That means that the type is "at least" that one, and there might be more cases, but the function that I wrote only contains the 3 cases. Anyway the short of this is that these open-ended, automatically resolved types don't require a type definition so they're less documented but more flexible. (And another thing: It's possible to reuse the constructors in unrelated types. eg. `A might be part of another type. But don't do this unless it's for a very good reason). In all cases the OCaml compiler enforces type safety, so although these types are less explicit, they're still safe. If I extend my example above and forget about one of the cases, the compiler will warn (in the v2v code, we've set up the flags to give a hard error): # let g j = match f j with `A -> 1 | `B -> 2 ;; Warning 8 [partial-match]: this pattern-matching is not exhaustive. Here is an example of a case that is not matched: `C _ val g : int -> int = <fun>> Based on my (incomplete understanding) of the ocaml docs, this looks > like a bad idea. We need no polymorphism here, and using a static type > (a fixed variant) is generally beneficial [3]. > > The following (minimally modified) definition at the OCaml REPL: > > let family1 distro > match distro with > | "fedora" > | "rhel" | "centos" | "scientificlinux" | "redhat-based" > | "oraclelinux" -> `RHEL_family > | "altlinux" -> `ALT_family > | "sles" | "suse-based" | "opensuse" -> `SUSE_family > | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> `Debian_family > | _ -> assert false;; > > deduces the following type: > > string -> [> `ALT_family | `Debian_family | `RHEL_family | `SUSE_family ] = <fun> > > with the nasty "[>" mark at the start (allowing for further type > refinement [2], which I think we don't need here). Conversely. > > type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family;; > > let family2 distro > match distro with > | "fedora" > | "rhel" | "centos" | "scientificlinux" | "redhat-based" > | "oraclelinux" -> RHEL_family > | "altlinux" -> ALT_family > | "sles" | "suse-based" | "opensuse" -> SUSE_family > | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> Debian_family > | _ -> assert false;; > > (note the explicit os_family type definition, and the removal of the > backticks from the constructor names) comes back with the type: > > val family2 : string -> os_family = <fun>Yes, this is all fair comment.> Basically converting a string to an enum constant. > > So, what's the reason for the polymorphic variant?I guess you could look at git history, but most likely it was "just because". It's a bit quicker and adding a new cases involves less code churn. I'm not opposed if you want to make the change above if you think it improves readability. In general I try to avoid tricksy features of OCaml, like functors, and this could (arguably) be one of those cases. BTW as an aside the implementation of the two types of constructors is completely different, and polymorphic variants have a slower implementation. For ordinary constructors they generally map to integers, which makes them very efficient (eg. they can be stored in registers etc). Polymorphic variants use a hash of the name instead making them slower overall. (see table in https://rwmj.wordpress.com/2009/08/05/ocaml-internals-part-2-strings-and-other-types/) But a fun thing is you can cause hash collisions (although it is picked up by the compiler - it's still type safe!): # type collision = [`Eric_Cooper | `azdwbie];; Error: Variant tags `azdwbie and `Eric_Cooper have the same hash value. Change one of them. (https://groups.google.com/g/fa.caml/c/68oTntkRLyU) There's even a link-time pass to find hash collisions across the whole program. Rich.> [1] https://ocaml.org/manual/coreexamples.html#s%3Atut-recvariants > [2] https://ocaml.org/manual/polyvariant.html#sec48 > [3] https://ocaml.org/manual/polyvariant.html#s%3Apolyvariant-weaknesses > > Thanks! > Laszlo-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/