The retry filter defaults to 5 retries, but when run with verbose details produces some confusing output: $ rm -f /tmp/inject; (sleep 5s; touch /tmp/inject)& ./nbdkit -f -v -U - \ null 1G --filter=retry --filter=noextents --filter=error error-rate=100% \ error-file=/tmp/inject --filter=delay rdelay=1 \ --run 'qemu-img convert $nbd out.img' ... nbdkit: null[1]: debug: noextents: pread count=2097152 offset=14680064 nbdkit: null[1]: debug: error: pread count=2097152 offset=14680064 nbdkit: null[1]: error: injecting EIO error into pread nbdkit: null[1]: debug: retry 6: original errno = 5 nbdkit: null[1]: debug: could not recover after 5 retries The code is correct (there were six attempts, but that corresponds to 5 retries after the initial attempt; comments mention that retry=0 turns the filter into a no-op but even that requires an initial attempt). So all we need to adjust is the output. Fixes: f0f0ec49 Signed-off-by: Eric Blake <eblake@redhat.com> --- I'm pushing this in response to bugzilla.redhat.com/1819240 At the same time, I'm looking at whether qemu-img should be tweaked to have a mode where it gives up on the first EIO error, rather than plowing on through the rest of the image even after the first error. filters/retry/retry.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/filters/retry/retry.c b/filters/retry/retry.c index 11e5313b..db91d7ca 100644 --- a/filters/retry/retry.c +++ b/filters/retry/retry.c @@ -173,7 +173,7 @@ do_retry (struct retry_handle *h, again: /* Log the original errno since it will be lost when we retry. */ - nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); + nbdkit_debug ("retry attempt %d: original errno = %d", data->retry, *err); if (data->retry >= retries) { nbdkit_debug ("could not recover after %d retries", retries); -- 2.26.0.rc2
Richard W.M. Jones
2020-Apr-01 10:12 UTC
[Libguestfs] [nbdkit PATCH] retry: Tweak error message
On Tue, Mar 31, 2020 at 04:00:22PM -0500, Eric Blake wrote:> The retry filter defaults to 5 retries, but when run with verbose > details produces some confusing output: > > $ rm -f /tmp/inject; (sleep 5s; touch /tmp/inject)& ./nbdkit -f -v -U - \ > null 1G --filter=retry --filter=noextents --filter=error error-rate=100% \ > error-file=/tmp/inject --filter=delay rdelay=1 \ > --run 'qemu-img convert $nbd out.img' > ... > nbdkit: null[1]: debug: noextents: pread count=2097152 offset=14680064 > nbdkit: null[1]: debug: error: pread count=2097152 offset=14680064 > nbdkit: null[1]: error: injecting EIO error into pread > nbdkit: null[1]: debug: retry 6: original errno = 5 > nbdkit: null[1]: debug: could not recover after 5 retries > > The code is correct (there were six attempts, but that corresponds to > 5 retries after the initial attempt; comments mention that retry=0 > turns the filter into a no-op but even that requires an initial > attempt). So all we need to adjust is the output. > > Fixes: f0f0ec49 > Signed-off-by: Eric Blake <eblake at redhat.com> > --- > > I'm pushing this in response to bugzilla.redhat.com/1819240 > At the same time, I'm looking at whether qemu-img should be tweaked > to have a mode where it gives up on the first EIO error, rather than > plowing on through the rest of the image even after the first error. > > filters/retry/retry.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/filters/retry/retry.c b/filters/retry/retry.c > index 11e5313b..db91d7ca 100644 > --- a/filters/retry/retry.c > +++ b/filters/retry/retry.c > @@ -173,7 +173,7 @@ do_retry (struct retry_handle *h, > > again: > /* Log the original errno since it will be lost when we retry. */ > - nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); > + nbdkit_debug ("retry attempt %d: original errno = %d", data->retry, *err); > > if (data->retry >= retries) { > nbdkit_debug ("could not recover after %d retries", retries);I think adding 1 was correct, so that the retries count from 1. How about moving the message to make it conditional instead, like this? again: /* Log the original errno since it will be lost when we retry. */ if (data->retry < retries) nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); else { nbdkit_debug ("could not recover after %d retries", retries); ... Rich. -- Richard Jones, Virtualization Group, Red Hat people.redhat.com/~rjones Read my programming and virtualization blog: rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. people.redhat.com/~rjones/virt-top
Eric Blake
2020-Apr-01 13:46 UTC
Re: [Libguestfs] [nbdkit PATCH] retry: Tweak error message
On 4/1/20 5:12 AM, Richard W.M. Jones wrote:> On Tue, Mar 31, 2020 at 04:00:22PM -0500, Eric Blake wrote: >> The retry filter defaults to 5 retries, but when run with verbose >> details produces some confusing output: >> >> $ rm -f /tmp/inject; (sleep 5s; touch /tmp/inject)& ./nbdkit -f -v -U - \ >> null 1G --filter=retry --filter=noextents --filter=error error-rate=100% \ >> error-file=/tmp/inject --filter=delay rdelay=1 \ >> --run 'qemu-img convert $nbd out.img' >> ... >> nbdkit: null[1]: debug: noextents: pread count=2097152 offset=14680064 >> nbdkit: null[1]: debug: error: pread count=2097152 offset=14680064 >> nbdkit: null[1]: error: injecting EIO error into pread >> nbdkit: null[1]: debug: retry 6: original errno = 5 >> nbdkit: null[1]: debug: could not recover after 5 retries >>>> again: >> /* Log the original errno since it will be lost when we retry. */ >> - nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); >> + nbdkit_debug ("retry attempt %d: original errno = %d", data->retry, *err); >> >> if (data->retry >= retries) { >> nbdkit_debug ("could not recover after %d retries", retries); > > I think adding 1 was correct, so that the retries count from 1.But the problem is the timing of when the messages are printed. And remember, this code is only reached after a failed attempt, not on success. Pre-patch, we have: initial try succeeds - no message or (here, with retries limited to 2): initial try fails: retry 1: original error delay retry 1 fails: retry 2: original error delay retry 2 fails: retry 3: original error could not recover after 2 retries> > How about moving the message to make it conditional instead, > like this? > > again: > /* Log the original errno since it will be lost when we retry. */ > if (data->retry < retries) > nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); > else { > nbdkit_debug ("could not recover after %d retries", retries); > ...with your suggestion, we'd have: initial try fails: retry 1: original error delay retry 1 fails: retry 2: original error delay retry 2 fails: could not recover after 2 retries where we lost the log of the errno of retry 2. Maybe better would be: initial try fails: attempt 0 failed with errno %d, starting delay for retry 1 delay retry 1 fails: attempt 1 failed with errno %d, starting delay for retry 2 delay retry 2 fails: attempt 2 failed with errno %d, could not recover -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org