On Thu, Nov 21, 2019 at 1:39 PM Richard W.M. Jones <rjones@redhat.com>
wrote:>
> [Adding the mailing list address back]
>
> On Thu, Nov 21, 2019 at 12:44:38PM +0200, Nir Soffer wrote:
> > On Thu, Nov 21, 2019 at 12:16 PM Richard W.M. Jones
<rjones@redhat.com> wrote:
> > >
> > > On Wed, Nov 20, 2019 at 06:00:52PM +0200, Nir Soffer wrote:
> > > > 5. can_zero fails, not sure why (Maybe Richard knows more
about this)
> > > >
> > > > nbdkit: python[1]: debug: python: can_zero
> > > > nbdkit: python[1]: debug: sending error reply: Operation not
supported
> > > >
> > > > rhv-upload-plugin always reports that it can zero, I cannot
explain this.
> > >
> > > I believe this is a bug in nbdkit’s python bindings. Will take a
> > > closer look and reply in a new thread. I don't believe it
affects
> > > anything here however.
> > >
> > > > 6. writing the first sector fails
> > > >
> > > > python[1]: nbdkit: debug: vddk[3]python: pwrite count=65536
offset=0 fua=0:
> > > > ...
> > > > nbdkit: python[1]: error:
> > > > /var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py: pwrite:
error:
> > > > ['Traceback (most recent call last):\n', ' File
> > > > "/var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py",
line 189, in
> > > > pwrite\n http.endheaders()\n', ' File
> > > > "/usr/lib64/python3.7/http/client.py", line 1247,
in endheaders\n
> > > > self._send_output(message_body,
encode_chunked=encode_chunked)\n', '
> > > > File "/usr/lib64/python3.7/http/client.py", line
1026, in
> > > > _send_output\n self.send(msg)\n', ' File
> > > > "/usr/lib64/python3.7/http/client.py", line 966,
in send\n
> > > > self.connect()\n', ' File
> > > > "/var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py",
line 357, in
> > > > connect\n self.sock.connect(self.path)\n',
'ConnectionRefusedError:
> > > > [Errno 111] Connection refused\n']
> > > >
> > > > But it fails in send(), and we don't handle errors
proeply around
> > > > send(), only around getresponse().
> > > > So the error is forgotten by the plugin.
> > >
> > > That sounds bad?
> >
> > Indeed, but we never had this issue in real environment.
> >
> > Basically we should handle any exception in the plugin as failure,
> > but I'm not sure where we should fix it.
> >
> > We can add error handler around all the entry points
> > (e.g. pwrite()), setting h['failed'] = True. But this makes it
> > harder to write correct plugins; everyone has to ensure correct
> > error handling instead of getting it right by default.
>
> In nbdkit-python-plugin, any exception which escapes will be caught in
> the C code and turned into an NBD error on the wire. The function
> which does this is:
>
>
https://github.com/libguestfs/nbdkit/blob/a52ebb4e8d18a73b9ae009cd768bfab18505f59a/plugins/python/python.c#L189
>
> Since each plugin method corresponds* (* = mostly) to a single NBD
> command on the wire, returning an error from the method signals an
> error back to the NBD client. If the NBD client is ‘qemu-img convert’
> then it will exit the first time it sees an error.
>
> (The retry filter http://libguestfs.org/nbdkit-retry-filter.1.html
> changes this behaviour, but we are not using that for output to
> rhv-upload.)
>
> > We can handle this in the python plugin (e.g. h['failed'] =
True)
> > but the handle is controlled by the plugin, so this does not sound
> > like a good way.
>
> The close() method is always called, whether or not there was an
> error. The same connection handle (‘h’) is passed back to the close
> method. It's quite safe and normal to store state in this handle.
>
> nbdkit itself doesn't save the fact that there was an error during the
> connection (it can't since it doesn't know if the error was
> recoverable or not). But rhv-upload-plugin needs to handle failed vs
> successful transfers differently. Thus we save the failure state in
> the handle the first time we see an error.
>
> > Maybe we can change close() to:
> >
> > def close(failed=True):
> > if failed:
> > # cleanup after some call failed
> > ...
> >
> > # Cleanup after successful run
> >
> > The python plugin will remember unhandled errors and will call:
> >
> > close(failed=True)
> >
> > To write correct plugin author must not handle fatal errors in the
plugin.
>
> This changes how the Python plugin would work versus other plugins,
> and the C code can't know if the NBD connection failed, because some
> NBD errors are recoverable. Also the client does not send us any
> error indication when it disconnects.
>
> The answer is probably to wrap every method in a big try...except
> which sets the failed flag in the handle.
Implemented in
https://www.redhat.com/archives/libguestfs/2019-November/msg00147.html
This works with current plugin, and can be backported to downstream if needed.
However it makes it harder to write plugins in python. The basic idea is that
not handling an error in python will crash with a traceback *without* having to
do anything. It would be best if the python plugin ensure this.
If we want to support recoverable errors, we can define a special
RecoverableError
exception that signal the python plugin that this request failed, but
the error is not
fatal.
We can also define more specific errors like UnsupportedOperation, for
example if
plugin's zero() fail and the author wants nbdkit to stop calling
zero() and emulate
it with write.
Nir