Richard W.M. Jones
2022-Jul-15 10:41 UTC
[Libguestfs] [PATCH virt-v2v 0/2] -o rhv: Wait for the NBD server to exit to avoid race with unmounting
This is an alternative approach to https://listman.redhat.com/archives/libguestfs/2022-July/029461.html This doesn't include a timeout. We expect that the NBD servers will exit when killed, if not it's a serious bug somewhere. Rich.
Richard W.M. Jones
2022-Jul-15 10:41 UTC
[Libguestfs] [PATCH virt-v2v 1/2] output: Permit output modes to wait on the local NBD server
Output.output_to_local_file is used by several output modes that write to local files or devices. It launches an instance of qemu-nbd or nbdkit connected to the local file. Previously we unconditionally added an On_exit handler to kill the NBD server. This is usually safe because nbdcopy --flush has guaranteed that the data was written through to permanent storage, and so killing the NBD server is just there to prevent orphaned processes. However for output to RHV (-o rhv) we actually need the NBD server to be cleaned up before we exit. See the analysis here: https://bugzilla.redhat.com/show_bug.cgi?id=1953286#c26 Allow the On_exit handler to be overridden for output modes (only -o rhv) which wish to handle this by themselves. --- output/output.mli | 16 ++++++-- output/output.ml | 79 ++++++++++++++++++++------------------ output/output_disk.ml | 4 +- output/output_glance.ml | 2 +- output/output_libvirt.ml | 4 +- output/output_openstack.ml | 2 +- output/output_qemu.ml | 4 +- output/output_rhv.ml | 4 +- output/output_vdsm.ml | 3 +- 9 files changed, 67 insertions(+), 51 deletions(-) diff --git a/output/output.mli b/output/output.mli index c1f0f53d8e..5eda03c74b 100644 --- a/output/output.mli +++ b/output/output.mli @@ -84,13 +84,23 @@ val error_if_disk_count_gt : string -> int -> unit called. *) val output_to_local_file : ?changeuid:((unit -> unit) -> unit) -> - ?compressed:bool -> + ?compressed:bool -> ?on_exit_kill:bool -> Types.output_allocation -> string -> string -> int64 -> string -> - unit + int (** When an output mode wants to create a local file with a particular format (only "raw" or "qcow2" allowed) then - this common function can be used. *) + this common function can be used. + + Exit handling: + + [?on_exit_kill] defaults to true. Most callers can leave that alone, + and also ignore the returned PID. The NBD server will be cleaned up + automatically with no further action required. + + However if you need to wait for the NBD server to be cleaned up, + you should set [~on_exit_kill:false] and then waitpid on the + returned PID. *) val disk_path : string -> string -> int -> string (** For [-o disk|qemu], return the output disk name of the i'th disk, diff --git a/output/output.ml b/output/output.ml index 23c3932d1b..0e57c4c7f6 100644 --- a/output/output.ml +++ b/output/output.ml @@ -70,6 +70,7 @@ let error_if_disk_count_gt dir n error (f_"this output module doesn't support copying more than %d disks") n let output_to_local_file ?(changeuid = fun f -> f ()) ?(compressed = false) + ?(on_exit_kill = true) output_alloc output_format filename size socket (* Check nbdkit is installed and has the required plugin. *) if not (Nbdkit.is_installed ()) then @@ -105,46 +106,50 @@ let output_to_local_file ?(changeuid = fun f -> f ()) ?(compressed = false) fun () -> g#disk_create ?preallocation filename output_format size ); - match output_format with - | "raw" -> - let cmd = Nbdkit.create "file" in - Nbdkit.add_arg cmd "file" filename; - if Nbdkit.version nbdkit_config >= (1, 22, 0) then ( - let cmd = Nbdkit.add_arg cmd "cache" "none" in - cmd - ); - let _, pid = Nbdkit.run_unix socket cmd in + let pid + match output_format with + | "raw" -> + let cmd = Nbdkit.create "file" in + Nbdkit.add_arg cmd "file" filename; + if Nbdkit.version nbdkit_config >= (1, 22, 0) then ( + let cmd = Nbdkit.add_arg cmd "cache" "none" in + cmd + ); + let _, pid = Nbdkit.run_unix socket cmd in + pid - (* --exit-with-parent should ensure nbdkit is cleaned - * up when we exit, but it's not supported everywhere. - *) - On_exit.kill pid + | "qcow2" -> + let cmd + if compressed then ( + let qemu_quote str = String.replace str "," ",," in + let image_opts = [ "driver=compress"; + "file.driver=qcow2"; + "file.file.driver=file"; + "file.file.filename=" ^ qemu_quote filename ] in + let image_opts = String.concat "," image_opts in + let cmd = QemuNBD.create image_opts in + QemuNBD.set_image_opts cmd true; + cmd + ) + else (* not compressed *) ( + let cmd = QemuNBD.create filename in + QemuNBD.set_format cmd (Some "qcow2"); + cmd + ) in + QemuNBD.set_snapshot cmd false; + let _, pid = QemuNBD.run_unix socket cmd in + pid - | "qcow2" -> - let cmd - if compressed then ( - let qemu_quote str = String.replace str "," ",," in - let image_opts = [ "driver=compress"; - "file.driver=qcow2"; - "file.file.driver=file"; - "file.file.filename=" ^ qemu_quote filename ] in - let image_opts = String.concat "," image_opts in - let cmd = QemuNBD.create image_opts in - QemuNBD.set_image_opts cmd true; - cmd - ) - else (* not compressed *) ( - let cmd = QemuNBD.create filename in - QemuNBD.set_format cmd (Some "qcow2"); - cmd - ) in - QemuNBD.set_snapshot cmd false; - let _, pid = QemuNBD.run_unix socket cmd in - On_exit.kill pid + | _ -> + error (f_"output mode only supports raw or qcow2 format (format: %s)") + output_format in - | _ -> - error (f_"output mode only supports raw or qcow2 format (format: %s)") - output_format + if on_exit_kill then + (* Kill the NBD server on exit. (For nbdkit we use --exit-with-parent + * but it's not supported everywhere). + *) + On_exit.kill pid; + pid let disk_path os name i let outdisk = sprintf "%s/%s-sd%s" os name (drive_name i) in diff --git a/output/output_disk.ml b/output/output_disk.ml index abcfcdc0ce..beef22011b 100644 --- a/output/output_disk.ml +++ b/output/output_disk.ml @@ -85,8 +85,8 @@ module Disk = struct (* Create the actual output disk. *) let outdisk = disk_path output_storage output_name i in - output_to_local_file ~compressed output_alloc output_format - outdisk size socket + ignore (output_to_local_file ~compressed output_alloc output_format + outdisk size socket) ) disks let finalize dir options () source inspect target_meta diff --git a/output/output_glance.ml b/output/output_glance.ml index e6a7f78994..40c6bb6993 100644 --- a/output/output_glance.ml +++ b/output/output_glance.ml @@ -85,7 +85,7 @@ module Glance = struct (* Create the actual output disk. *) let outdisk = sprintf "%s/%d" tmpdir i in - output_to_local_file Sparse output_format outdisk size socket + ignore (output_to_local_file Sparse output_format outdisk size socket) ) disks; tmpdir diff --git a/output/output_libvirt.ml b/output/output_libvirt.ml index 04b4c5f81e..1188d560d7 100644 --- a/output/output_libvirt.ml +++ b/output/output_libvirt.ml @@ -130,8 +130,8 @@ module Libvirt_ = struct (* Create the actual output disk. *) let outdisk = target_path // output_name ^ "-sd" ^ (drive_name i) in - output_to_local_file ~compressed output_alloc output_format - outdisk size socket + ignore (output_to_local_file ~compressed output_alloc output_format + outdisk size socket) ) disks; (capabilities_xml, pool_name) diff --git a/output/output_openstack.ml b/output/output_openstack.ml index aa01d5a67c..bbfe53c741 100644 --- a/output/output_openstack.ml +++ b/output/output_openstack.ml @@ -380,7 +380,7 @@ The os-* parameters and environment variables are optional. let socket = sprintf "%s/out%d" dir i in On_exit.unlink socket; - output_to_local_file Sparse "raw" dev size socket + ignore (output_to_local_file Sparse "raw" dev size socket) ) (List.combine disks devices); !volume_ids diff --git a/output/output_qemu.ml b/output/output_qemu.ml index 5788fc4255..b1eb3e9cd0 100644 --- a/output/output_qemu.ml +++ b/output/output_qemu.ml @@ -91,8 +91,8 @@ module QEMU = struct (* Create the actual output disk. *) let outdisk = disk_path output_storage output_name i in - output_to_local_file ~compressed output_alloc output_format - outdisk size socket + ignore (output_to_local_file ~compressed output_alloc output_format + outdisk size socket) ) disks let finalize dir options () source inspect target_meta diff --git a/output/output_rhv.ml b/output/output_rhv.ml index 15a2c14adf..5e3c031e38 100644 --- a/output/output_rhv.ml +++ b/output/output_rhv.ml @@ -175,8 +175,8 @@ module RHV = struct chmod filename 0o666 ) in - output_to_local_file ~changeuid - output_alloc output_format filename size socket + ignore (output_to_local_file ~changeuid + output_alloc output_format filename size socket) ) (List.combine disks filenames); (* Save parameters since we need them during finalization. *) diff --git a/output/output_vdsm.ml b/output/output_vdsm.ml index 23d1b9cd25..c8da47c7c1 100644 --- a/output/output_vdsm.ml +++ b/output/output_vdsm.ml @@ -200,7 +200,8 @@ For each disk you must supply one of each of these options: On_exit.unlink socket; (* Create the actual output disk. *) - output_to_local_file output_alloc output_format filename size socket + ignore (output_to_local_file output_alloc output_format + filename size socket) ) (List.combine disks filenames); (* Save parameters since we need them during finalization. *) -- 2.37.0.rc2
Richard W.M. Jones
2022-Jul-15 10:41 UTC
[Libguestfs] [PATCH virt-v2v 2/2] -o rhv: Wait for the NBD server to exit to avoid a race with unmounting
https://bugzilla.redhat.com/show_bug.cgi?id=1953286#c26 --- output/output_rhv.ml | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/output/output_rhv.ml b/output/output_rhv.ml index 5e3c031e38..f57ac9099e 100644 --- a/output/output_rhv.ml +++ b/output/output_rhv.ml @@ -175,8 +175,21 @@ module RHV = struct chmod filename 0o666 ) in - ignore (output_to_local_file ~changeuid - output_alloc output_format filename size socket) + let pid = output_to_local_file ~changeuid ~on_exit_kill:false + output_alloc output_format filename size socket in + + (* We have to wait for the NBD server to exit rather than just + * killing it, otherwise it races with unmounting. See: + * https://bugzilla.redhat.com/show_bug.cgi?id=1953286#c26 + *) + On_exit.f ( + fun () -> + kill pid Sys.sigterm; + (* Errors from the NBD server don't matter. On successful + * completion we've already committed the data to disk. + *) + ignore (waitpid [] pid) + ); ) (List.combine disks filenames); (* Save parameters since we need them during finalization. *) -- 2.37.0.rc2