Hi, as a follow-up of BZ #1883399 [1], we are reviewing vdsm VM migration flows and solve few follow-up bugs, e.g. BZ #1981079 [2]. I have couple of questions related to libvirt: * if we run disk extend during migration, it can happen that migration finishes sooner than disk extend. In such case we will try to set disk threshold on already stopped VM (we handle libvirt event that VM was stopper, but due to Python GIL there can be a delay between obtaining appropriate signal from libvirt and handling it). In such case we get libvirt VIR_ERR_OPERATION_INVALID when setting disk threshold. Is it safe to catch this exception and ignore it or it's thrown for various reasons and the root cause can be something else than stopped VM? * after disk extend, we resume VM if it's stopped (usually due to running out of the disk space). Is it safe to do so also when we do the disk extend during migration and VM can be stopped because it was already migrated? I.e. can we assume that libvirt will handle such situation and won't resume VM in such case? We do some checks before resume and try to avoid situation when we resume migrated VM, but there can be some corner cases and it would be useful to know if we can rely in libvirt to prevent resuming VM in unwanted cases like one when VM is stopper after migration. Thanks Vojta [1] https://bugzilla.redhat.com/1883399 [2] https://bugzilla.redhat.com/1981079 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20210802/12a43cdb/attachment.sig>
On Mon, Aug 02, 2021 at 14:20:44 +0200, Vojtech Juranek wrote:> Hi, > as a follow-up of BZ #1883399 [1], we are reviewing vdsm VM migration flows and > solve few follow-up bugs, e.g. BZ #1981079 [2]. I have couple of questions > related to libvirt: > > * if we run disk extend during migration, it can happen that migration finishes > sooner than disk extend. In such case we will try to set disk threshold on > already stopped VM (we handle libvirt event that VM was stopper, but due to > Python GIL there can be a delay between obtaining appropriate signal from > libvirt and handling it). In such case we get libvirt > VIR_ERR_OPERATION_INVALID when setting disk threshold. Is it safe to > catch this exception and ignore it or it's thrown for various reasons and the > root cause can be something else than stopped VM?The API to set the block trheshold level can return the following errors including cases when it can happen: VIR_ERR_OPERATION_UNSUPPORTED <- unlikely new qemu supports it VIR_ERR_INVALID_ARG <- disk was not found in VM definition VIR_ERR_INTERNAL_ERROR <- on error from qemu Thus VIR_ERR_OPERATION_INVALID seems to be safe to ignore in your specific case, while not ignoring others can be used to catch problems.