Eric Blake
2020-Apr-01 13:46 UTC
Re: [Libguestfs] [nbdkit PATCH] retry: Tweak error message
On 4/1/20 5:12 AM, Richard W.M. Jones wrote:> On Tue, Mar 31, 2020 at 04:00:22PM -0500, Eric Blake wrote: >> The retry filter defaults to 5 retries, but when run with verbose >> details produces some confusing output: >> >> $ rm -f /tmp/inject; (sleep 5s; touch /tmp/inject)& ./nbdkit -f -v -U - \ >> null 1G --filter=retry --filter=noextents --filter=error error-rate=100% \ >> error-file=/tmp/inject --filter=delay rdelay=1 \ >> --run 'qemu-img convert $nbd out.img' >> ... >> nbdkit: null[1]: debug: noextents: pread count=2097152 offset=14680064 >> nbdkit: null[1]: debug: error: pread count=2097152 offset=14680064 >> nbdkit: null[1]: error: injecting EIO error into pread >> nbdkit: null[1]: debug: retry 6: original errno = 5 >> nbdkit: null[1]: debug: could not recover after 5 retries >>>> again: >> /* Log the original errno since it will be lost when we retry. */ >> - nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); >> + nbdkit_debug ("retry attempt %d: original errno = %d", data->retry, *err); >> >> if (data->retry >= retries) { >> nbdkit_debug ("could not recover after %d retries", retries); > > I think adding 1 was correct, so that the retries count from 1.But the problem is the timing of when the messages are printed. And remember, this code is only reached after a failed attempt, not on success. Pre-patch, we have: initial try succeeds - no message or (here, with retries limited to 2): initial try fails: retry 1: original error delay retry 1 fails: retry 2: original error delay retry 2 fails: retry 3: original error could not recover after 2 retries> > How about moving the message to make it conditional instead, > like this? > > again: > /* Log the original errno since it will be lost when we retry. */ > if (data->retry < retries) > nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); > else { > nbdkit_debug ("could not recover after %d retries", retries); > ...with your suggestion, we'd have: initial try fails: retry 1: original error delay retry 1 fails: retry 2: original error delay retry 2 fails: could not recover after 2 retries where we lost the log of the errno of retry 2. Maybe better would be: initial try fails: attempt 0 failed with errno %d, starting delay for retry 1 delay retry 1 fails: attempt 1 failed with errno %d, starting delay for retry 2 delay retry 2 fails: attempt 2 failed with errno %d, could not recover -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2020-Apr-02 09:40 UTC
Re: [Libguestfs] [nbdkit PATCH] retry: Tweak error message
On Wed, Apr 01, 2020 at 08:46:53AM -0500, Eric Blake wrote:> On 4/1/20 5:12 AM, Richard W.M. Jones wrote: > >On Tue, Mar 31, 2020 at 04:00:22PM -0500, Eric Blake wrote: > >>The retry filter defaults to 5 retries, but when run with verbose > >>details produces some confusing output: > >> > >>$ rm -f /tmp/inject; (sleep 5s; touch /tmp/inject)& ./nbdkit -f -v -U - \ > >> null 1G --filter=retry --filter=noextents --filter=error error-rate=100% \ > >> error-file=/tmp/inject --filter=delay rdelay=1 \ > >> --run 'qemu-img convert $nbd out.img' > >>... > >>nbdkit: null[1]: debug: noextents: pread count=2097152 offset=14680064 > >>nbdkit: null[1]: debug: error: pread count=2097152 offset=14680064 > >>nbdkit: null[1]: error: injecting EIO error into pread > >>nbdkit: null[1]: debug: retry 6: original errno = 5 > >>nbdkit: null[1]: debug: could not recover after 5 retries > >> > > >> again: > >> /* Log the original errno since it will be lost when we retry. */ > >>- nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); > >>+ nbdkit_debug ("retry attempt %d: original errno = %d", data->retry, *err); > >> > >> if (data->retry >= retries) { > >> nbdkit_debug ("could not recover after %d retries", retries); > > > >I think adding 1 was correct, so that the retries count from 1. > > But the problem is the timing of when the messages are printed. And > remember, this code is only reached after a failed attempt, not on > success. Pre-patch, we have: > > initial try succeeds - no message > > or (here, with retries limited to 2): > > initial try fails: > retry 1: original error > delay > retry 1 fails: > retry 2: original error > delay > retry 2 fails: > retry 3: original error > could not recover after 2 retries > > > > >How about moving the message to make it conditional instead, > >like this? > > > >again: > > /* Log the original errno since it will be lost when we retry. */ > > if (data->retry < retries) > > nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); > > else { > > nbdkit_debug ("could not recover after %d retries", retries); > > ... > > with your suggestion, we'd have: > > initial try fails: > retry 1: original error > delay > retry 1 fails: > retry 2: original error > delay > retry 2 fails: > could not recover after 2 retries > > where we lost the log of the errno of retry 2.It's not printed, but also it's not lost. do_retry will immediately return false, causing the called function to return this final errno (hopefully back to the client). The reason for printing the "original errno" was because those intermediate errnos would otherwise be lost. Attached is my suggested patch - it's a small evolution on top of yours. The output is now: $ rm -f /tmp/inject ; (sleep 5s; touch /tmp/inject ) & ./nbdkit -f -v -U - null size=1G --filter=retry --filter=noextents --filter=error error-rate=100% error-file=/tmp/inject --filter=delay rdelay=1 retries=3 --run 'qemu-img convert $nbd /var/tmp/out.img' nbdkit: null[1]: error: injecting EIO error into pread nbdkit: null[1]: debug: pread failed: original errno = 5 nbdkit: null[1]: debug: retry 1: waiting 2 seconds before retrying nbdkit: null[1]: error: injecting EIO error into pread nbdkit: null[1]: debug: pread failed: original errno = 5 nbdkit: null[1]: debug: retry 2: waiting 4 seconds before retrying nbdkit: null[1]: error: injecting EIO error into pread nbdkit: null[1]: debug: pread failed: original errno = 5 nbdkit: null[1]: debug: retry 3: waiting 8 seconds before retrying nbdkit: null[1]: error: injecting EIO error into pread nbdkit: null[1]: debug: could not recover after 3 retries nbdkit: null[1]: debug: sending error reply: Input/output error Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Eric Blake
2020-Apr-02 12:38 UTC
Re: [Libguestfs] [nbdkit PATCH] retry: Tweak error message
On 4/2/20 4:40 AM, Richard W.M. Jones wrote:>> with your suggestion, we'd have: >> >> initial try fails: >> retry 1: original error >> delay >> retry 1 fails: >> retry 2: original error >> delay >> retry 2 fails: >> could not recover after 2 retries >> >> where we lost the log of the errno of retry 2. > > It's not printed, but also it's not lost. do_retry will immediately > return false, causing the called function to return this final errno > (hopefully back to the client). The reason for printing the "original > errno" was because those intermediate errnos would otherwise be lost.Good point.> > Attached is my suggested patch - it's a small evolution on top of > yours. The output is now: > > $ rm -f /tmp/inject ; (sleep 5s; touch /tmp/inject ) & ./nbdkit -f -v -U - null size=1G --filter=retry --filter=noextents --filter=error error-rate=100% error-file=/tmp/inject --filter=delay rdelay=1 retries=3 --run 'qemu-img convert $nbd /var/tmp/out.img' > > nbdkit: null[1]: error: injecting EIO error into pread > nbdkit: null[1]: debug: pread failed: original errno = 5 > nbdkit: null[1]: debug: retry 1: waiting 2 seconds before retrying > > nbdkit: null[1]: error: injecting EIO error into pread > nbdkit: null[1]: debug: pread failed: original errno = 5 > nbdkit: null[1]: debug: retry 2: waiting 4 seconds before retrying > > nbdkit: null[1]: error: injecting EIO error into pread > nbdkit: null[1]: debug: pread failed: original errno = 5 > nbdkit: null[1]: debug: retry 3: waiting 8 seconds before retrying > > nbdkit: null[1]: error: injecting EIO error into pread > nbdkit: null[1]: debug: could not recover after 3 retries > nbdkit: null[1]: debug: sending error reply: Input/output error >Yes, this is looking better. Let's go for it. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org