Hello everyone, I?m looking at an issue where I do see guests freezing (Dl) process state during a block disk mirror from one storage to another storage (NFS) where the network stack of the guest can freeze for up to 10 seconds. Looking at the storage and IO I noticed good throughput ad low latency <3ms and I am having trouble to track down the source for the issue, as neither storage nor networking show issues. Interestingly when I do the same test with virtio-blk I do not really see the process freezes at the frequency or duration compared to virtio-scsi which seem to indicate a client side rather than storage side problem. I had looked at the syscalls and nothing stuck out: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 28.51 20.672654 8339 2479 ioctl 27.81 20.162714 3379 5967 31 futex 22.02 15.964498 785 20335 poll 15.22 11.038403 150 73561 io_submit 4.17 3.023285 41 73540 lseek 1.20 0.868003 5 158591 write 0.63 0.459030 11 42871 ppoll 0.22 0.159263 8 19314 recvmsg 0.16 0.115520 5 22526 read 0.04 0.029149 29149 1 restart_syscall 0.01 0.009252 28 330 sendmsg 0.00 0.001221 1221 1 munmap 0.00 0.000458 22 21 fcntl 0.00 0.000286 95 3 openat 0.00 0.000166 5 32 rt_sigprocmask 0.00 0.000103 10 10 fdatasync 0.00 0.000099 25 4 clone 0.00 0.000081 7 12 mmap 0.00 0.000077 19 4 close 0.00 0.000076 6 12 mprotect 0.00 0.000056 14 4 madvise 0.00 0.000025 6 4 set_robust_list 0.00 0.000023 6 4 prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 72.504442 419626 31 total Does anyone have an idea how to better debug this issue ? Thanks Bjoern -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20220414/3021406c/attachment-0001.htm>
On Thu, Apr 14, 2022 at 16:36:38 +0000, Bjoern Teipel wrote:> Hello everyone,Hi,> > I?m looking at an issue where I do see guests freezing (Dl) process state during a block disk mirror from one storage to another storage (NFS) where the network stack of the guest can freeze for up to 10 seconds. > Looking at the storage and IO I noticed good throughput ad low latency <3ms and I am having trouble to track down the source for the issue, as neither storage nor networking show issues. Interestingly when I do the same test with virtio-blk I do not really see the process freezes at the frequency or duration compared to virtio-scsi which seem to indicate a client side rather than storage side problem.Hmm, this is really weird if the difference is in the guest-facing device frontend. Since libvirt is merely setting up the block job for the copy and the copy itself is handled by qemu I suggest you contact the qemu-block at nongnu.org mailing list. Unfortunately you didn't provide any information on the disk configuration (the VM XML) or how you start the blockjob, which I could translate for you into qemu specifics. If you provide such information I can do that to ensure that the qemu folks have all the relevant information.