Richard W.M. Jones
2022-Feb-09 10:32 UTC
[Libguestfs] [PATCH v2v v2] output: -o rhv-upload: Kill nbdkit instances before finalizing
If nbdkit instance(s) are still running then they will hold open some http connections to imageio. In some versions of imageio, starting finalization in this state causes a 60 second timeout. See-also: https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html Thanks: Nir Soffer --- output/output_rhv_upload.ml | 39 ++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/output/output_rhv_upload.ml b/output/output_rhv_upload.ml index 4d8dc1c135..b500551c5f 100644 --- a/output/output_rhv_upload.ml +++ b/output/output_rhv_upload.ml @@ -280,9 +280,22 @@ e command line has to match the number of guest disk images (for this guest: %d) ignore (Python_script.run_command cancel_script json_params []) in - (* Set up an at-exit handler so we delete the orphan disks on failure. *) + (* Set up an at-exit handler to perform some cleanups. + * - Kill nbdkit PIDs (only before finalization). + * - Delete the orphan disks (only on conversion failure). + *) + let nbdkit_pids = ref [] in On_exit.f ( fun () -> + (* Kill the nbdkit PIDs. *) + List.iter ( + fun pid -> + try + kill pid Sys.sigterm + with exn -> debug "%s" (Printexc.to_string exn) + ) !nbdkit_pids; + nbdkit_pids := []; + (* virt-v2v writes v2vdir/done on success only. *) let success = Sys.file_exists (dir // "done") in if not success then ( @@ -351,11 +364,7 @@ e command line has to match the number of guest disk images (for this guest: %d) if is_ovirt_host then Nbdkit.add_arg cmd "is_ovirt_host" "true"; let _, pid = Nbdkit.run_unix ~socket cmd in - - (* --exit-with-parent should ensure nbdkit is cleaned - * up when we exit, but it's not supported everywhere. - *) - On_exit.kill pid; + List.push_front pid nbdkit_pids ) (List.combine disks disk_uuids); (* Stash some data we will need during finalization. *) @@ -363,7 +372,7 @@ e command line has to match the number of guest disk images (for this guest: %d) let t = (disk_sizes : int64 list), disk_uuids, !transfer_ids, finalize_script, createvm_script, json_params, rhv_storagedomain_uuid, rhv_cluster_uuid, - rhv_cluster_cpu_architecture, rhv_cluster_name in + rhv_cluster_cpu_architecture, rhv_cluster_name, nbdkit_pids in t and rhv_upload_finalize dir source inspect target_meta @@ -374,7 +383,8 @@ and rhv_upload_finalize dir source inspect target_meta (disk_sizes, disk_uuids, transfer_ids, finalize_script, createvm_script, json_params, rhv_storagedomain_uuid, rhv_cluster_uuid, - rhv_cluster_cpu_architecture, rhv_cluster_name) + rhv_cluster_cpu_architecture, rhv_cluster_name, + nbdkit_pids) (* Check the cluster CPU arch matches what we derived about the * guest during conversion. *) @@ -386,6 +396,17 @@ and rhv_upload_finalize dir source inspect target_meta rhv_cluster_name target_meta.guestcaps.gcaps_arch arch ); + (* We must kill all our nbdkit instances before finalizing the + * transfer. See: + * https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html + * + * We want to fail here if the kill fails because nbdkit + * died already, as that would be unexpected. + *) + List.iter (fun pid -> kill pid Sys.sigterm) !nbdkit_pids; + List.iter (fun pid -> ignore (waitpid [] pid)) !nbdkit_pids; + nbdkit_pids := []; (* Don't kill them again in the On_exit handler. *) + (* Finalize all the transfers. *) let json_params let ids = List.map (fun id -> JSON.String id) transfer_ids in @@ -442,7 +463,7 @@ module RHVUpload = struct type t = int64 list * string list * string list * Python_script.script * Python_script.script * JSON.field list * string option * string option * - string option * string + string option * string * int list ref let to_string options "-o rhv-upload" ^ -- 2.32.0
Laszlo Ersek
2022-Feb-09 13:22 UTC
[Libguestfs] [PATCH v2v v2] output: -o rhv-upload: Kill nbdkit instances before finalizing
On 02/09/22 11:32, Richard W.M. Jones wrote:> If nbdkit instance(s) are still running then they will hold open some > http connections to imageio. In some versions of imageio, starting > finalization in this state causes a 60 second timeout. > > See-also: https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html > Thanks: Nir Soffer > --- > output/output_rhv_upload.ml | 39 ++++++++++++++++++++++++++++--------- > 1 file changed, 30 insertions(+), 9 deletions(-) > > diff --git a/output/output_rhv_upload.ml b/output/output_rhv_upload.ml > index 4d8dc1c135..b500551c5f 100644 > --- a/output/output_rhv_upload.ml > +++ b/output/output_rhv_upload.ml > @@ -280,9 +280,22 @@ e command line has to match the number of guest disk images (for this guest: %d) > ignore (Python_script.run_command cancel_script json_params []) > in > > - (* Set up an at-exit handler so we delete the orphan disks on failure. *) > + (* Set up an at-exit handler to perform some cleanups. > + * - Kill nbdkit PIDs (only before finalization). > + * - Delete the orphan disks (only on conversion failure). > + *) > + let nbdkit_pids = ref [] in > On_exit.f ( > fun () -> > + (* Kill the nbdkit PIDs. *) > + List.iter ( > + fun pid -> > + try > + kill pid Sys.sigterm > + with exn -> debug "%s" (Printexc.to_string exn) > + ) !nbdkit_pids; > + nbdkit_pids := []; > + > (* virt-v2v writes v2vdir/done on success only. *) > let success = Sys.file_exists (dir // "done") in > if not success then ( > @@ -351,11 +364,7 @@ e command line has to match the number of guest disk images (for this guest: %d) > if is_ovirt_host then > Nbdkit.add_arg cmd "is_ovirt_host" "true"; > let _, pid = Nbdkit.run_unix ~socket cmd in > - > - (* --exit-with-parent should ensure nbdkit is cleaned > - * up when we exit, but it's not supported everywhere. > - *) > - On_exit.kill pid; > + List.push_front pid nbdkit_pids > ) (List.combine disks disk_uuids); > > (* Stash some data we will need during finalization. *) > @@ -363,7 +372,7 @@ e command line has to match the number of guest disk images (for this guest: %d) > let t = (disk_sizes : int64 list), disk_uuids, !transfer_ids, > finalize_script, createvm_script, json_params, > rhv_storagedomain_uuid, rhv_cluster_uuid, > - rhv_cluster_cpu_architecture, rhv_cluster_name in > + rhv_cluster_cpu_architecture, rhv_cluster_name, nbdkit_pids in > t > > and rhv_upload_finalize dir source inspect target_meta > @@ -374,7 +383,8 @@ and rhv_upload_finalize dir source inspect target_meta > (disk_sizes, disk_uuids, transfer_ids, > finalize_script, createvm_script, json_params, > rhv_storagedomain_uuid, rhv_cluster_uuid, > - rhv_cluster_cpu_architecture, rhv_cluster_name) > + rhv_cluster_cpu_architecture, rhv_cluster_name, > + nbdkit_pids) > (* Check the cluster CPU arch matches what we derived about the > * guest during conversion. > *) > @@ -386,6 +396,17 @@ and rhv_upload_finalize dir source inspect target_meta > rhv_cluster_name target_meta.guestcaps.gcaps_arch arch > ); > > + (* We must kill all our nbdkit instances before finalizing the > + * transfer. See: > + * https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html > + * > + * We want to fail here if the kill fails because nbdkit > + * died already, as that would be unexpected. > + *) > + List.iter (fun pid -> kill pid Sys.sigterm) !nbdkit_pids; > + List.iter (fun pid -> ignore (waitpid [] pid)) !nbdkit_pids; > + nbdkit_pids := []; (* Don't kill them again in the On_exit handler. *) > + > (* Finalize all the transfers. *) > let json_params > let ids = List.map (fun id -> JSON.String id) transfer_ids in > @@ -442,7 +463,7 @@ module RHVUpload = struct > type t = int64 list * string list * string list * > Python_script.script * Python_script.script * > JSON.field list * string option * string option * > - string option * string > + string option * string * int list ref > > let to_string options > "-o rhv-upload" ^ >Reviewed-by: Laszlo Ersek <lersek at redhat.com>
Nir Soffer
2022-Feb-09 13:51 UTC
[Libguestfs] [PATCH v2v v2] output: -o rhv-upload: Kill nbdkit instances before finalizing
On Wed, Feb 9, 2022 at 12:32 PM Richard W.M. Jones <rjones at redhat.com> wrote:> > If nbdkit instance(s) are still running then they will hold open some > http connections to imageio. In some versions of imageio, starting > finalization in this state causes a 60 second timeout. > > See-also: https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html > Thanks: Nir Soffer > --- > output/output_rhv_upload.ml | 39 ++++++++++++++++++++++++++++--------- > 1 file changed, 30 insertions(+), 9 deletions(-) > > diff --git a/output/output_rhv_upload.ml b/output/output_rhv_upload.ml > index 4d8dc1c135..b500551c5f 100644 > --- a/output/output_rhv_upload.ml > +++ b/output/output_rhv_upload.ml > @@ -280,9 +280,22 @@ e command line has to match the number of guest disk images (for this guest: %d) > ignore (Python_script.run_command cancel_script json_params []) > in > > - (* Set up an at-exit handler so we delete the orphan disks on failure. *) > + (* Set up an at-exit handler to perform some cleanups. > + * - Kill nbdkit PIDs (only before finalization). > + * - Delete the orphan disks (only on conversion failure). > + *) > + let nbdkit_pids = ref [] in > On_exit.f ( > fun () -> > + (* Kill the nbdkit PIDs. *) > + List.iter ( > + fun pid -> > + try > + kill pid Sys.sigterm > + with exn -> debug "%s" (Printexc.to_string exn) > + ) !nbdkit_pids; > + nbdkit_pids := []; > + > (* virt-v2v writes v2vdir/done on success only. *) > let success = Sys.file_exists (dir // "done") in > if not success then ( > @@ -351,11 +364,7 @@ e command line has to match the number of guest disk images (for this guest: %d) > if is_ovirt_host then > Nbdkit.add_arg cmd "is_ovirt_host" "true"; > let _, pid = Nbdkit.run_unix ~socket cmd in > - > - (* --exit-with-parent should ensure nbdkit is cleaned > - * up when we exit, but it's not supported everywhere. > - *) > - On_exit.kill pid; > + List.push_front pid nbdkit_pids > ) (List.combine disks disk_uuids); > > (* Stash some data we will need during finalization. *) > @@ -363,7 +372,7 @@ e command line has to match the number of guest disk images (for this guest: %d) > let t = (disk_sizes : int64 list), disk_uuids, !transfer_ids, > finalize_script, createvm_script, json_params, > rhv_storagedomain_uuid, rhv_cluster_uuid, > - rhv_cluster_cpu_architecture, rhv_cluster_name in > + rhv_cluster_cpu_architecture, rhv_cluster_name, nbdkit_pids in > t > > and rhv_upload_finalize dir source inspect target_meta > @@ -374,7 +383,8 @@ and rhv_upload_finalize dir source inspect target_meta > (disk_sizes, disk_uuids, transfer_ids, > finalize_script, createvm_script, json_params, > rhv_storagedomain_uuid, rhv_cluster_uuid, > - rhv_cluster_cpu_architecture, rhv_cluster_name) > + rhv_cluster_cpu_architecture, rhv_cluster_name, > + nbdkit_pids) > (* Check the cluster CPU arch matches what we derived about the > * guest during conversion. > *) > @@ -386,6 +396,17 @@ and rhv_upload_finalize dir source inspect target_meta > rhv_cluster_name target_meta.guestcaps.gcaps_arch arch > ); > > + (* We must kill all our nbdkit instances before finalizing the > + * transfer. See: > + * https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html > + * > + * We want to fail here if the kill fails because nbdkit > + * died already, as that would be unexpected. > + *) > + List.iter (fun pid -> kill pid Sys.sigterm) !nbdkit_pids; > + List.iter (fun pid -> ignore (waitpid [] pid)) !nbdkit_pids;Do we kill all nbdkits instances or only rhv-upload-plugin? For example for vdsm output we try to access nbkit during finalization. What if the process handles the SIGTERM and does not exit? One example is a deadlock during termination. It will be more robust to wait for termination for a short time, and repeat the process with SIGKILL. Or just use SIGKILL to avoid the extra step. If we handle removal of pid files and sockets, there is no need for clean shutdown.> + nbdkit_pids := []; (* Don't kill them again in the On_exit handler. *) > + > (* Finalize all the transfers. *) > let json_params > let ids = List.map (fun id -> JSON.String id) transfer_ids in > @@ -442,7 +463,7 @@ module RHVUpload = struct > type t = int64 list * string list * string list * > Python_script.script * Python_script.script * > JSON.field list * string option * string option * > - string option * string > + string option * string * int list ref > > let to_string options > "-o rhv-upload" ^ > -- > 2.32.0 >Otherwise the intent of terminating and waiting nbdkit before finalizing sounds like the right way. Nir