Nir Soffer
2021-Oct-22 22:25 UTC
[Libguestfs] [PATCH libnbd] lib/poll.c: Retry poll after EINTR
I see a rare random failure when calling BlockStatus via Go binding: block_status: nbd_block_status: poll: Interrupted system call I could not reproduce this with "nbdinfo --map", even after modifying it to call nbd_block_status() for every 128 MiB. Fixing this in nbd_unlock_poll() avoids this issue in the entire library, when we wait for command completion. This seems more useful that fixing it in all libnbd clients. Tested using a go client listing all extents in an image, calling BlockStatus for every 128m with fedora 34 qcow2 image. Without this fix, this was always failing. $ hyperfine -r1000 --show-output "./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null" Benchmark 1: ./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null Time (mean ? ?): 31.6 ms ? 3.1 ms [User: 8.8 ms, System: 7.2 ms] Range (min ? max): 26.1 ms ? 52.3 ms 1000 runs Signed-off-by: Nir Soffer <nsoffer at redhat.com> --- lib/poll.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/poll.c b/lib/poll.c index edfcc59..df01d94 100644 --- a/lib/poll.c +++ b/lib/poll.c @@ -57,8 +57,11 @@ nbd_unlocked_poll (struct nbd_handle *h, int timeout) * would allow other threads to close file descriptors which we have * passed to poll. */ - r = poll (fds, 1, timeout); - debug (h, "poll end: r=%d revents=%x", r, fds[0].revents); + do { + r = poll (fds, 1, timeout); + debug (h, "poll end: r=%d revents=%x", r, fds[0].revents); + } while (r == -1 && errno == EINTR); + if (r == -1) { set_error (errno, "poll"); return -1; -- 2.31.1
Nir Soffer
2021-Oct-22 22:58 UTC
[Libguestfs] [PATCH libnbd] lib/poll.c: Retry poll after EINTR
On Sat, Oct 23, 2021 at 1:25 AM Nir Soffer <nirsof at gmail.com> wrote:> > I see a rare random failure when calling BlockStatus via Go binding: > > block_status: nbd_block_status: poll: Interrupted system call > > I could not reproduce this with "nbdinfo --map", even after modifying it > to call nbd_block_status() for every 128 MiB. > > Fixing this in nbd_unlock_poll() avoids this issue in the entire > library, when we wait for command completion. This seems more useful > that fixing it in all libnbd clients.An alternative is to fix this in the Go binding. Looking at related fix in go standard library, Go code do not expect to handle EINTR: https://go-review.googlesource.com/c/go/+/232862/ I looked briefly in the generator - looks like we need to change this code in generator/GoLang.ml: pr " ret = nbd_%s " name; C.print_arg_list ~wrap:true ~handle:true ~types:false args optargs; pr ";\n"; (match errcode with | None -> () | Some errcode -> pr " if (ret == %s)\n" errcode; pr " save_error (err);\n"; to add a retry loop for all wrappers.> Tested using a go client listing all extents in an image, calling > BlockStatus for every 128m with fedora 34 qcow2 image. Without this fix, > this was always failing. > > $ hyperfine -r1000 --show-output "./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null" > Benchmark 1: ./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null > Time (mean ? ?): 31.6 ms ? 3.1 ms [User: 8.8 ms, System: 7.2 ms] > Range (min ? max): 26.1 ms ? 52.3 ms 1000 runs > > Signed-off-by: Nir Soffer <nsoffer at redhat.com> > --- > lib/poll.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/lib/poll.c b/lib/poll.c > index edfcc59..df01d94 100644 > --- a/lib/poll.c > +++ b/lib/poll.c > @@ -57,8 +57,11 @@ nbd_unlocked_poll (struct nbd_handle *h, int timeout) > * would allow other threads to close file descriptors which we have > * passed to poll. > */ > - r = poll (fds, 1, timeout); > - debug (h, "poll end: r=%d revents=%x", r, fds[0].revents); > + do { > + r = poll (fds, 1, timeout); > + debug (h, "poll end: r=%d revents=%x", r, fds[0].revents); > + } while (r == -1 && errno == EINTR); > + > if (r == -1) { > set_error (errno, "poll"); > return -1; > -- > 2.31.1 >
Richard W.M. Jones
2021-Oct-23 10:10 UTC
[Libguestfs] [PATCH libnbd] lib/poll.c: Retry poll after EINTR
On Sat, Oct 23, 2021 at 01:25:21AM +0300, Nir Soffer wrote:> I see a rare random failure when calling BlockStatus via Go binding: > > block_status: nbd_block_status: poll: Interrupted system call > > I could not reproduce this with "nbdinfo --map", even after modifying it > to call nbd_block_status() for every 128 MiB. > > Fixing this in nbd_unlock_poll() avoids this issue in the entire > library, when we wait for command completion. This seems more useful > that fixing it in all libnbd clients. > > Tested using a go client listing all extents in an image, calling > BlockStatus for every 128m with fedora 34 qcow2 image. Without this fix, > this was always failing. > > $ hyperfine -r1000 --show-output "./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null" > Benchmark 1: ./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null > Time (mean ? ?): 31.6 ms ? 3.1 ms [User: 8.8 ms, System: 7.2 ms] > Range (min ? max): 26.1 ms ? 52.3 ms 1000 runs > > Signed-off-by: Nir Soffer <nsoffer at redhat.com> > --- > lib/poll.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/lib/poll.c b/lib/poll.c > index edfcc59..df01d94 100644 > --- a/lib/poll.c > +++ b/lib/poll.c > @@ -57,8 +57,11 @@ nbd_unlocked_poll (struct nbd_handle *h, int timeout) > * would allow other threads to close file descriptors which we have > * passed to poll. > */ > - r = poll (fds, 1, timeout); > - debug (h, "poll end: r=%d revents=%x", r, fds[0].revents); > + do { > + r = poll (fds, 1, timeout); > + debug (h, "poll end: r=%d revents=%x", r, fds[0].revents); > + } while (r == -1 && errno == EINTR); > + > if (r == -1) { > set_error (errno, "poll"); > return -1;I think this is a perfectly reasonable solution, and better than modifying the language bindings. I guess Eric should have the final word here, but from my point of view ACK. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html