Heya, We?ve recently hit a number of failures on QEMU P2P live migrations which appear to be caused by transient networking disconnects at different points in the migration process. We would like to implement smarter retry logic in our control plane to ensure such issues don?t stall critical workflows. On the other hand, we cannot blindly retry every failed migration because doing so greatly lengthens the time to fail high level autmation when there is a real problem. Are there currently any generally understood best practices for retrying migrations from a control plane perspective? Ideally we would decide whether or not to retry based on error codes, but especially in the QEMU P2P migration path many generic codes are returned. For example, see [1] where we attempted to improve an error code for a likely retry-able set of failure cases. [1] https://listman.redhat.com/archives/libvir-list/2022-January/msg00217.html Thanks, Raphael