Richard W.M. Jones
2021-Oct-01 11:03 UTC
[Libguestfs] translating CD-ROM device paths from i440fx to Q35 in virt-v2v
On Fri, Oct 01, 2021 at 12:35:50PM +0200, Laszlo Ersek wrote:> Hi Rich, > > (dropping libvirt-devel) > > On 09/30/21 13:53, Richard W.M. Jones wrote: > > > Also we don't currently try to find or rewrite /dev/disk/ paths in > > guest configuration files. The only rewriting that happens is for > > /dev/[hs]d* block device filenames and a few others. The actual code > > that does this is convert/convert_linux.ml:remap_block_devices > > > > So I wouldn't over-think this. It's likely fine to identify such > > devices and rewrite them as "/dev/cdrom", assuming that (I didn't > > check) udev creates that symlink for any reasonably modern Linux. And > > if there's more than one attached CD to the source, only convert the > > first one and warn about but drop the others. > > So I've started reading "convert/convert_linux.ml" in parallel with the > OCaml manual. (I dabbled for a few weeks in Haskell when everyone else > did a few years ago, so it's not 100% unfamiliar.) > > Question: > > let family > match inspect.i_distro with > | "fedora" > | "rhel" | "centos" | "scientificlinux" | "redhat-based" > | "oraclelinux" -> `RHEL_family > | "altlinux" -> `ALT_family > | "sles" | "suse-based" | "opensuse" -> `SUSE_family > | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> `Debian_family > > Here, RHEL_family, ALT_family, SUSE_family, Debian_family are not plain > constructors ("fixed" variants) [1] but polymorphic ones [2]. > > Why? > > Was this done *only* in order so that an explicit type definition such > as > > type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family > > could be avoided?There are two types of variants in OCaml. The first is the normal one and it requires a type definition. eg: type t = A | B | C of int | D of string * int with values like ?A? or ?C 3? or ?D ("hello", 1)?. These are ones we'd normally use and have the (arguable?) advantage that you have to write (and therefore, document) the full type somewhere. The second type are polymorphic variants. These are actually way more complicated than I'm going to describe below because they have a load of set theory behind them (if you really want to know see the docs that you've already linked to). They are written with backtick before the name of the variant. The brief idea is that you let the OCaml compiler work out the type for you. For example: let f = function 1 -> `A | 2 -> `B | i -> `C i creates a function f which returns the type: type pv = [ `A | `B | `C of int ] Actually if you type this into the toplevel it says: $ rlwrap ocaml OCaml version 4.12.0 # let f = function 1 -> `A | 2 -> `B | i -> `C i ;; val f : int -> [> `A | `B | `C of int ] = <fun> As you observe below there's an extra ">" character. That means that the type is "at least" that one, and there might be more cases, but the function that I wrote only contains the 3 cases. Anyway the short of this is that these open-ended, automatically resolved types don't require a type definition so they're less documented but more flexible. (And another thing: It's possible to reuse the constructors in unrelated types. eg. `A might be part of another type. But don't do this unless it's for a very good reason). In all cases the OCaml compiler enforces type safety, so although these types are less explicit, they're still safe. If I extend my example above and forget about one of the cases, the compiler will warn (in the v2v code, we've set up the flags to give a hard error): # let g j = match f j with `A -> 1 | `B -> 2 ;; Warning 8 [partial-match]: this pattern-matching is not exhaustive. Here is an example of a case that is not matched: `C _ val g : int -> int = <fun>> Based on my (incomplete understanding) of the ocaml docs, this looks > like a bad idea. We need no polymorphism here, and using a static type > (a fixed variant) is generally beneficial [3]. > > The following (minimally modified) definition at the OCaml REPL: > > let family1 distro > match distro with > | "fedora" > | "rhel" | "centos" | "scientificlinux" | "redhat-based" > | "oraclelinux" -> `RHEL_family > | "altlinux" -> `ALT_family > | "sles" | "suse-based" | "opensuse" -> `SUSE_family > | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> `Debian_family > | _ -> assert false;; > > deduces the following type: > > string -> [> `ALT_family | `Debian_family | `RHEL_family | `SUSE_family ] = <fun> > > with the nasty "[>" mark at the start (allowing for further type > refinement [2], which I think we don't need here). Conversely. > > type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family;; > > let family2 distro > match distro with > | "fedora" > | "rhel" | "centos" | "scientificlinux" | "redhat-based" > | "oraclelinux" -> RHEL_family > | "altlinux" -> ALT_family > | "sles" | "suse-based" | "opensuse" -> SUSE_family > | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> Debian_family > | _ -> assert false;; > > (note the explicit os_family type definition, and the removal of the > backticks from the constructor names) comes back with the type: > > val family2 : string -> os_family = <fun>Yes, this is all fair comment.> Basically converting a string to an enum constant. > > So, what's the reason for the polymorphic variant?I guess you could look at git history, but most likely it was "just because". It's a bit quicker and adding a new cases involves less code churn. I'm not opposed if you want to make the change above if you think it improves readability. In general I try to avoid tricksy features of OCaml, like functors, and this could (arguably) be one of those cases. BTW as an aside the implementation of the two types of constructors is completely different, and polymorphic variants have a slower implementation. For ordinary constructors they generally map to integers, which makes them very efficient (eg. they can be stored in registers etc). Polymorphic variants use a hash of the name instead making them slower overall. (see table in https://rwmj.wordpress.com/2009/08/05/ocaml-internals-part-2-strings-and-other-types/) But a fun thing is you can cause hash collisions (although it is picked up by the compiler - it's still type safe!): # type collision = [`Eric_Cooper | `azdwbie];; Error: Variant tags `azdwbie and `Eric_Cooper have the same hash value. Change one of them. (https://groups.google.com/g/fa.caml/c/68oTntkRLyU) There's even a link-time pass to find hash collisions across the whole program. Rich.> [1] https://ocaml.org/manual/coreexamples.html#s%3Atut-recvariants > [2] https://ocaml.org/manual/polyvariant.html#sec48 > [3] https://ocaml.org/manual/polyvariant.html#s%3Apolyvariant-weaknesses > > Thanks! > Laszlo-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Richard W.M. Jones
2021-Oct-01 11:21 UTC
[Libguestfs] translating CD-ROM device paths from i440fx to Q35 in virt-v2v
On Fri, Oct 01, 2021 at 12:03:22PM +0100, Richard W.M. Jones wrote:> BTW as an aside the implementation of the two types of constructors is > completely different, and polymorphic variants have a slower > implementation. For ordinary constructors they generally map to > integers, which makes them very efficient (eg. they can be stored in > registers etc). Polymorphic variants use a hash of the name instead > making them slower overall.I mean it's slower not because of the hash - which is computed at compile time - but because various tricks cannot be used at runtime which are applicable to small ints. eg. For the types: type t = AAA | BBB | CCC type pv = [`AAA | `BBB | `CCC] and functions: let ft = function AAA -> 0 | BBB -> 1 | CCC -> 2 let fpv = function `AAA -> 0 | `BBB -> 1 | `CCC -> 2 the first function is an identity (because the input and output have the same runtime representation). It only forces the LSB to be set, for unclear reasons, I think this may be a missed optimization: camlTest__ft_86: sarq $1, %rax ; %rax /= 2 leaq 1(%rax,%rax), %rax ; %rax = 2*%rax+1 ret but the second function requires a bunch of work to deal with the hash values: camlTest__fpv_89: cmpq $6593797, %rax je .L101 cmpq $6693703, %rax jl .L102 movl $5, %eax ; return OCaml int 5 = C int 2 ret .align 4 .L102: movl $1, %eax ; return OCaml int 1 = C int 0 ret .align 4 .L101: movl $3, %eax ; return OCaml int 3 = C int 1 ret Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Laszlo Ersek
2021-Oct-01 11:29 UTC
[Libguestfs] translating CD-ROM device paths from i440fx to Q35 in virt-v2v
On 10/01/21 13:03, Richard W.M. Jones wrote:> On Fri, Oct 01, 2021 at 12:35:50PM +0200, Laszlo Ersek wrote: >> Hi Rich, >> >> (dropping libvirt-devel) >> >> On 09/30/21 13:53, Richard W.M. Jones wrote: >> >>> Also we don't currently try to find or rewrite /dev/disk/ paths in >>> guest configuration files. The only rewriting that happens is for >>> /dev/[hs]d* block device filenames and a few others. The actual code >>> that does this is convert/convert_linux.ml:remap_block_devices >>> >>> So I wouldn't over-think this. It's likely fine to identify such >>> devices and rewrite them as "/dev/cdrom", assuming that (I didn't >>> check) udev creates that symlink for any reasonably modern Linux. And >>> if there's more than one attached CD to the source, only convert the >>> first one and warn about but drop the others. >> >> So I've started reading "convert/convert_linux.ml" in parallel with the >> OCaml manual. (I dabbled for a few weeks in Haskell when everyone else >> did a few years ago, so it's not 100% unfamiliar.) >> >> Question: >> >> let family >> match inspect.i_distro with >> | "fedora" >> | "rhel" | "centos" | "scientificlinux" | "redhat-based" >> | "oraclelinux" -> `RHEL_family >> | "altlinux" -> `ALT_family >> | "sles" | "suse-based" | "opensuse" -> `SUSE_family >> | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> `Debian_family >> >> Here, RHEL_family, ALT_family, SUSE_family, Debian_family are not plain >> constructors ("fixed" variants) [1] but polymorphic ones [2]. >> >> Why? >> >> Was this done *only* in order so that an explicit type definition such >> as >> >> type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family >> >> could be avoided? > > There are two types of variants in OCaml. The first is the normal one > and it requires a type definition. eg: > > type t = A | B | C of int | D of string * int > > with values like ?A? or ?C 3? or ?D ("hello", 1)?. > > These are ones we'd normally use and have the (arguable?) advantage > that you have to write (and therefore, document) the full type somewhere. > > The second type are polymorphic variants. These are actually way more > complicated than I'm going to describe below because they have a load > of set theory behind them (if you really want to know see the docs > that you've already linked to). They are written with backtick before > the name of the variant. The brief idea is that you let the OCaml > compiler work out the type for you. > > For example: > > let f = function 1 -> `A | 2 -> `B | i -> `C i > > creates a function f which returns the type: > > type pv = [ `A | `B | `C of int ] > > Actually if you type this into the toplevel it says: > > $ rlwrap ocaml > OCaml version 4.12.0 > > # let f = function 1 -> `A | 2 -> `B | i -> `C i ;; > val f : int -> [> `A | `B | `C of int ] = <fun> > > As you observe below there's an extra ">" character. That means that > the type is "at least" that one, and there might be more cases, but > the function that I wrote only contains the 3 cases. > > Anyway the short of this is that these open-ended, automatically > resolved types don't require a type definition so they're less > documented but more flexible. > > (And another thing: It's possible to reuse the constructors in > unrelated types. eg. `A might be part of another type. But don't do > this unless it's for a very good reason). > > In all cases the OCaml compiler enforces type safety, so although > these types are less explicit, they're still safe. If I extend my > example above and forget about one of the cases, the compiler will > warn (in the v2v code, we've set up the flags to give a hard error): > > # let g j = match f j with `A -> 1 | `B -> 2 ;; > Warning 8 [partial-match]: this pattern-matching is not exhaustive. > Here is an example of a case that is not matched: > `C _ > val g : int -> int = <fun> > >> Based on my (incomplete understanding) of the ocaml docs, this looks >> like a bad idea. We need no polymorphism here, and using a static type >> (a fixed variant) is generally beneficial [3]. >> >> The following (minimally modified) definition at the OCaml REPL: >> >> let family1 distro >> match distro with >> | "fedora" >> | "rhel" | "centos" | "scientificlinux" | "redhat-based" >> | "oraclelinux" -> `RHEL_family >> | "altlinux" -> `ALT_family >> | "sles" | "suse-based" | "opensuse" -> `SUSE_family >> | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> `Debian_family >> | _ -> assert false;; >> >> deduces the following type: >> >> string -> [> `ALT_family | `Debian_family | `RHEL_family | `SUSE_family ] = <fun> >> >> with the nasty "[>" mark at the start (allowing for further type >> refinement [2], which I think we don't need here). Conversely. >> >> type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family;; >> >> let family2 distro >> match distro with >> | "fedora" >> | "rhel" | "centos" | "scientificlinux" | "redhat-based" >> | "oraclelinux" -> RHEL_family >> | "altlinux" -> ALT_family >> | "sles" | "suse-based" | "opensuse" -> SUSE_family >> | "debian" | "ubuntu" | "linuxmint" | "kalilinux" -> Debian_family >> | _ -> assert false;; >> >> (note the explicit os_family type definition, and the removal of the >> backticks from the constructor names) comes back with the type: >> >> val family2 : string -> os_family = <fun> > > Yes, this is all fair comment. > >> Basically converting a string to an enum constant. >> >> So, what's the reason for the polymorphic variant? > > I guess you could look at git history, but most likely it was "just > because". It's a bit quicker and adding a new cases involves less > code churn. > > I'm not opposed if you want to make the change above if you think it > improves readability.Thanks for the help! I don't want to leave a trail of such patches marking my learning of OCaml :) Just wanted to understand the reason. Laszlo> In general I try to avoid tricksy features of > OCaml, like functors, and this could (arguably) be one of those cases. > > BTW as an aside the implementation of the two types of constructors is > completely different, and polymorphic variants have a slower > implementation. For ordinary constructors they generally map to > integers, which makes them very efficient (eg. they can be stored in > registers etc). Polymorphic variants use a hash of the name instead > making them slower overall. > > (see table in https://rwmj.wordpress.com/2009/08/05/ocaml-internals-part-2-strings-and-other-types/) > > But a fun thing is you can cause hash collisions (although it is > picked up by the compiler - it's still type safe!): > > # type collision = [`Eric_Cooper | `azdwbie];; > Error: Variant tags `azdwbie and `Eric_Cooper have the same hash value. > Change one of them. > > (https://groups.google.com/g/fa.caml/c/68oTntkRLyU) > > There's even a link-time pass to find hash collisions across the whole > program. > > Rich. > >> [1] https://ocaml.org/manual/coreexamples.html#s%3Atut-recvariants >> [2] https://ocaml.org/manual/polyvariant.html#sec48 >> [3] https://ocaml.org/manual/polyvariant.html#s%3Apolyvariant-weaknesses >> >> Thanks! >> Laszlo >