Jason Wang
2023-May-16 02:44 UTC
[PATCH net-next V2 1/2] virtio-net: convert rx mode setting to use workqueue
On Mon, May 15, 2023 at 6:17?PM Michael S. Tsirkin <mst at redhat.com> wrote:> > On Mon, May 15, 2023 at 01:13:33PM +0800, Jason Wang wrote: > > On Mon, May 15, 2023 at 12:45?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > On Mon, May 15, 2023 at 09:05:54AM +0800, Jason Wang wrote: > > > > On Wed, May 10, 2023 at 1:33?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > On Mon, Apr 17, 2023 at 11:40:58AM +0800, Jason Wang wrote: > > > > > > On Fri, Apr 14, 2023 at 3:21?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > > > > > On Fri, Apr 14, 2023 at 01:04:15PM +0800, Jason Wang wrote: > > > > > > > > Forget to cc netdev, adding. > > > > > > > > > > > > > > > > On Fri, Apr 14, 2023 at 12:25?AM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > > > > > > > > > On Thu, Apr 13, 2023 at 02:40:26PM +0800, Jason Wang wrote: > > > > > > > > > > This patch convert rx mode setting to be done in a workqueue, this is > > > > > > > > > > a must for allow to sleep when waiting for the cvq command to > > > > > > > > > > response since current code is executed under addr spin lock. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang at redhat.com> > > > > > > > > > > > > > > > > > > I don't like this frankly. This means that setting RX mode which would > > > > > > > > > previously be reliable, now becomes unreliable. > > > > > > > > > > > > > > > > It is "unreliable" by design: > > > > > > > > > > > > > > > > void (*ndo_set_rx_mode)(struct net_device *dev); > > > > > > > > > > > > > > > > > - first of all configuration is no longer immediate > > > > > > > > > > > > > > > > Is immediate a hard requirement? I can see a workqueue is used at least: > > > > > > > > > > > > > > > > mlx5e, ipoib, efx, ... > > > > > > > > > > > > > > > > > and there is no way for driver to find out when > > > > > > > > > it actually took effect > > > > > > > > > > > > > > > > But we know rx mode is best effort e.g it doesn't support vhost and we > > > > > > > > survive from this for years. > > > > > > > > > > > > > > > > > - second, if device fails command, this is also not > > > > > > > > > propagated to driver, again no way for driver to find out > > > > > > > > > > > > > > > > > > VDUSE needs to be fixed to do tricks to fix this > > > > > > > > > without breaking normal drivers. > > > > > > > > > > > > > > > > It's not specific to VDUSE. For example, when using virtio-net in the > > > > > > > > UP environment with any software cvq (like mlx5 via vDPA or cma > > > > > > > > transport). > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Hmm. Can we differentiate between these use-cases? > > > > > > > > > > > > It doesn't look easy since we are drivers for virtio bus. Underlayer > > > > > > details were hidden from virtio-net. > > > > > > > > > > > > Or do you have any ideas on this? > > > > > > > > > > > > Thanks > > > > > > > > > > I don't know, pass some kind of flag in struct virtqueue? > > > > > "bool slow; /* This vq can be very slow sometimes. Don't wait for it! */" > > > > > > > > > > ? > > > > > > > > > > > > > So if it's slow, sleep, otherwise poll? > > > > > > > > I feel setting this flag might be tricky, since the driver doesn't > > > > know whether or not it's really slow. E.g smartNIC vendor may allow > > > > virtio-net emulation over PCI. > > > > > > > > Thanks > > > > > > driver will have the choice, depending on whether > > > vq is deterministic or not. > > > > Ok, but the problem is, such booleans are only useful for virtio ring > > codes. But in this case, virtio-net knows what to do for cvq. So I'm > > not sure who the user is. > > > > Thanks > > Circling back, what exactly does the architecture you are trying > to fix look like? Who is going to introduce unbounded latency? > The hypervisor?Hypervisor is one of the possible reason, we have many more: Hardware device that provides virtio-pci emulation. Userspace devices like VDUSE.> If so do we not maybe want a new feature bit > that documents this? Hypervisor then can detect old guests > that spin and decide what to do, e.g. prioritise cvq more, > or fail FEATURES_OK.We suffer from this for bare metal as well. But a question is what's wrong with the approach that is used in this patch? I've answered that set_rx_mode is not reliable, so it should be fine to use workqueue. Except for this, any other thing that worries you? Thanks> > > > > > > > > > > > -- > > > > > MST > > > > > > > > >
Michael S. Tsirkin
2023-May-16 04:12 UTC
[PATCH net-next V2 1/2] virtio-net: convert rx mode setting to use workqueue
On Tue, May 16, 2023 at 10:44:45AM +0800, Jason Wang wrote:> On Mon, May 15, 2023 at 6:17?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > On Mon, May 15, 2023 at 01:13:33PM +0800, Jason Wang wrote: > > > On Mon, May 15, 2023 at 12:45?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > On Mon, May 15, 2023 at 09:05:54AM +0800, Jason Wang wrote: > > > > > On Wed, May 10, 2023 at 1:33?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > > > On Mon, Apr 17, 2023 at 11:40:58AM +0800, Jason Wang wrote: > > > > > > > On Fri, Apr 14, 2023 at 3:21?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > > > > > > > On Fri, Apr 14, 2023 at 01:04:15PM +0800, Jason Wang wrote: > > > > > > > > > Forget to cc netdev, adding. > > > > > > > > > > > > > > > > > > On Fri, Apr 14, 2023 at 12:25?AM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > On Thu, Apr 13, 2023 at 02:40:26PM +0800, Jason Wang wrote: > > > > > > > > > > > This patch convert rx mode setting to be done in a workqueue, this is > > > > > > > > > > > a must for allow to sleep when waiting for the cvq command to > > > > > > > > > > > response since current code is executed under addr spin lock. > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang at redhat.com> > > > > > > > > > > > > > > > > > > > > I don't like this frankly. This means that setting RX mode which would > > > > > > > > > > previously be reliable, now becomes unreliable. > > > > > > > > > > > > > > > > > > It is "unreliable" by design: > > > > > > > > > > > > > > > > > > void (*ndo_set_rx_mode)(struct net_device *dev); > > > > > > > > > > > > > > > > > > > - first of all configuration is no longer immediate > > > > > > > > > > > > > > > > > > Is immediate a hard requirement? I can see a workqueue is used at least: > > > > > > > > > > > > > > > > > > mlx5e, ipoib, efx, ... > > > > > > > > > > > > > > > > > > > and there is no way for driver to find out when > > > > > > > > > > it actually took effect > > > > > > > > > > > > > > > > > > But we know rx mode is best effort e.g it doesn't support vhost and we > > > > > > > > > survive from this for years. > > > > > > > > > > > > > > > > > > > - second, if device fails command, this is also not > > > > > > > > > > propagated to driver, again no way for driver to find out > > > > > > > > > > > > > > > > > > > > VDUSE needs to be fixed to do tricks to fix this > > > > > > > > > > without breaking normal drivers. > > > > > > > > > > > > > > > > > > It's not specific to VDUSE. For example, when using virtio-net in the > > > > > > > > > UP environment with any software cvq (like mlx5 via vDPA or cma > > > > > > > > > transport). > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > Hmm. Can we differentiate between these use-cases? > > > > > > > > > > > > > > It doesn't look easy since we are drivers for virtio bus. Underlayer > > > > > > > details were hidden from virtio-net. > > > > > > > > > > > > > > Or do you have any ideas on this? > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > I don't know, pass some kind of flag in struct virtqueue? > > > > > > "bool slow; /* This vq can be very slow sometimes. Don't wait for it! */" > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > So if it's slow, sleep, otherwise poll? > > > > > > > > > > I feel setting this flag might be tricky, since the driver doesn't > > > > > know whether or not it's really slow. E.g smartNIC vendor may allow > > > > > virtio-net emulation over PCI. > > > > > > > > > > Thanks > > > > > > > > driver will have the choice, depending on whether > > > > vq is deterministic or not. > > > > > > Ok, but the problem is, such booleans are only useful for virtio ring > > > codes. But in this case, virtio-net knows what to do for cvq. So I'm > > > not sure who the user is. > > > > > > Thanks > > > > Circling back, what exactly does the architecture you are trying > > to fix look like? Who is going to introduce unbounded latency? > > The hypervisor? > > Hypervisor is one of the possible reason, we have many more: > > Hardware device that provides virtio-pci emulation. > Userspace devices like VDUSE.So let's start by addressing VDUSE maybe?> > If so do we not maybe want a new feature bit > > that documents this? Hypervisor then can detect old guests > > that spin and decide what to do, e.g. prioritise cvq more, > > or fail FEATURES_OK. > > We suffer from this for bare metal as well. > > But a question is what's wrong with the approach that is used in this > patch? I've answered that set_rx_mode is not reliable, so it should be > fine to use workqueue. Except for this, any other thing that worries > you? > > ThanksIt's not reliable for other drivers but has been reliable for virtio. I worry some software relied on this. You are making good points though ... could we get some maintainer's feedback on this?> > > > > > > > > > > > > > > > -- > > > > > > MST > > > > > > > > > > > >