Hi Hannes,> a) let the device do the timeout: pass in a timeout value with the > command, and allow the device to return an ETIMEDOUT error when the > timeout expires. Then it's up to the device to do the necessary timeout > handling; the server won't be involved at all (except for handling an > ETIMEDOUT error)This won't work if the device crashes.> > b) implement an 'abort' command. With that the server controls the > timeout, and is allowed to send an 'abort' command when the timeout > expires. That requires the device to be able to abort commands (which > not all devices are able to), but avoids having to implement a timeout > handling in the device.I actually thought about this idea. This may work, but you'll still have a few moments when the server assumes that the command failed, and the network device assumes that it succeeded. So the server may still receive packets in an unexpected queue.> > I am very much in favour of having timeouts for virtio commands; we've > had several massive customer escalations which could have been solved if > we were able to set the command timeout in the VM. > As this was for virtio-scsi/virtio-block I would advocate to have a > generic virtio command timeout, not a protocol-specific one.This may be difficult to implement. Especially when multiple commands may be queued at the same time, and the device can handle the commands in any order. We'll need to add identifiers for every command. I'm actually referring here to the Linux kernel implementation of virtnet control commands, in which the server spins for a response.
On 8/24/22 11:42, Alvaro Karsz wrote:> Hi Hannes, > >> a) let the device do the timeout: pass in a timeout value with the >> command, and allow the device to return an ETIMEDOUT error when the >> timeout expires. Then it's up to the device to do the necessary timeout >> handling; the server won't be involved at all (except for handling an >> ETIMEDOUT error) > > > This won't work if the device crashes. > >> >> b) implement an 'abort' command. With that the server controls the >> timeout, and is allowed to send an 'abort' command when the timeout >> expires. That requires the device to be able to abort commands (which >> not all devices are able to), but avoids having to implement a timeout >> handling in the device. > > > I actually thought about this idea. > This may work, but you'll still have a few moments when the server > assumes that the command failed, and the network device assumes that > it succeeded. > So the server may still receive packets in an unexpected queue. > >No. The server may only assume that the command failed until it gets the response for the abort command. Before that the state of the command is undefined, and the server may not assume anything here. And then we get into the fun topic of timing out aborts, which really can only be resolved if you have a fool-proof way of resetting the queue itself. But I guess virtio can do that (right?).>> >> I am very much in favour of having timeouts for virtio commands; we've >> had several massive customer escalations which could have been solved if >> we were able to set the command timeout in the VM. >> As this was for virtio-scsi/virtio-block I would advocate to have a >> generic virtio command timeout, not a protocol-specific one. > > This may be difficult to implement. > Especially when multiple commands may be queued at the same time, and > the device can handle the commands in any order. > We'll need to add identifiers for every command. >Why, but of course. You cannot assume in-order delivery of the completions; in fact, that's the whole _point_ of having a queue-based I/O command delivery method.> I'm actually referring here to the Linux kernel implementation of > virtnet control commands, in which the server spins for a response.Sheesh. Spinning for a response is never a good idea, as this means you'll end up with a non-interruptible command in the guest (essentially an ioctl into the hypervisor). And really, identifying the command isn't hard. Each command already has an identifier (namely the virtio ring index), so if in doubt you can always use that. To be foolproof, though, you might want to add a 'real' identifier (like a 32 or 64 bit command tag), which would even allow you to catch uninitialized / completed commands. Tends to be quite important when implementing an 'abort' command, as the command referred to by the 'abort' command might have been completed by the time the hypervisor processes the abort command. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare at suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 N?rnberg HRB 36809 (AG N?rnberg), GF: Felix Imend?rffer
On Wed, Aug 24, 2022 at 5:43 PM Alvaro Karsz <alvaro.karsz at solid-run.com> wrote:> > Hi Hannes, > > > a) let the device do the timeout: pass in a timeout value with the > > command, and allow the device to return an ETIMEDOUT error when the > > timeout expires. Then it's up to the device to do the necessary timeout > > handling; the server won't be involved at all (except for handling an > > ETIMEDOUT error) > > > This won't work if the device crashes.Yes, from the view of the hardening. Driver should not trust/depend on device behaviour.> > > > > b) implement an 'abort' command. With that the server controls the > > timeout, and is allowed to send an 'abort' command when the timeout > > expires. That requires the device to be able to abort commands (which > > not all devices are able to), but avoids having to implement a timeout > > handling in the device. > > > I actually thought about this idea. > This may work, but you'll still have a few moments when the server > assumes that the command failed, and the network device assumes that > it succeeded. > So the server may still receive packets in an unexpected queue.Similar to the previous case. Driver should not trust the device to execute any command correctly.> > > > > > I am very much in favour of having timeouts for virtio commands; we've > > had several massive customer escalations which could have been solved if > > we were able to set the command timeout in the VM. > > As this was for virtio-scsi/virtio-block I would advocate to have a > > generic virtio command timeout, not a protocol-specific one. > > This may be difficult to implement. > Especially when multiple commands may be queued at the same time, and > the device can handle the commands in any order. > We'll need to add identifiers for every command.Having a timeout that is under the control of the driver might be possible. Anyhow this needs to be discussed in the virtio-dev. Thanks> > I'm actually referring here to the Linux kernel implementation of > virtnet control commands, in which the server spins for a response. >