Michael S. Tsirkin
2022-Dec-27  09:38 UTC
[PATCH 3/4] virtio_ring: introduce a per virtqueue waitqueue
On Tue, Dec 27, 2022 at 05:12:58PM +0800, Jason Wang wrote:> > ? 2022/12/27 15:33, Michael S. Tsirkin ??: > > On Tue, Dec 27, 2022 at 12:30:35PM +0800, Jason Wang wrote: > > > > But device is still going and will later use the buffers. > > > > > > > > Same for timeout really. > > > Avoiding infinite wait/poll is one of the goals, another is to sleep. > > > If we think the timeout is hard, we can start from the wait. > > > > > > Thanks > > If the goal is to avoid disrupting traffic while CVQ is in use, > > that sounds more reasonable. E.g. someone is turning on promisc, > > a spike in CPU usage might be unwelcome. > > > Yes, this would be more obvious is UP is used. > > > > > > things we should be careful to address then: > > 1- debugging. Currently it's easy to see a warning if CPU is stuck > > in a loop for a while, and we also get a backtrace. > > E.g. with this - how do we know who has the RTNL? > > We need to integrate with kernel/watchdog.c for good results > > and to make sure policy is consistent. > > > That's fine, will consider this. > > > > 2- overhead. In a very common scenario when device is in hypervisor, > > programming timers etc has a very high overhead, at bootup > > lots of CVQ commands are run and slowing boot down is not nice. > > let's poll for a bit before waiting? > > > Then we go back to the question of choosing a good timeout for poll. And > poll seems problematic in the case of UP, scheduler might not have the > chance to run.Poll just a bit :) Seriously I don't know, but at least check once after kick.> > > 3- suprise removal. need to wake up thread in some way. what about > > other cases of device breakage - is there a chance this > > introduces new bugs around that? at least enumerate them please. > > > The current code did: > > 1) check for vq->broken > 2) wakeup during BAD_RING() > > So we won't end up with a never woke up process which should be fine. > > ThanksBTW BAD_RING on removal will trigger dev_err. Not sure that is a good idea - can cause crashes if kernel panics on error.> > > > >
Jason Wang
2022-Dec-28  06:34 UTC
[PATCH 3/4] virtio_ring: introduce a per virtqueue waitqueue
? 2022/12/27 17:38, Michael S. Tsirkin ??:> On Tue, Dec 27, 2022 at 05:12:58PM +0800, Jason Wang wrote: >> ? 2022/12/27 15:33, Michael S. Tsirkin ??: >>> On Tue, Dec 27, 2022 at 12:30:35PM +0800, Jason Wang wrote: >>>>> But device is still going and will later use the buffers. >>>>> >>>>> Same for timeout really. >>>> Avoiding infinite wait/poll is one of the goals, another is to sleep. >>>> If we think the timeout is hard, we can start from the wait. >>>> >>>> Thanks >>> If the goal is to avoid disrupting traffic while CVQ is in use, >>> that sounds more reasonable. E.g. someone is turning on promisc, >>> a spike in CPU usage might be unwelcome. >> >> Yes, this would be more obvious is UP is used. >> >> >>> things we should be careful to address then: >>> 1- debugging. Currently it's easy to see a warning if CPU is stuck >>> in a loop for a while, and we also get a backtrace. >>> E.g. with this - how do we know who has the RTNL? >>> We need to integrate with kernel/watchdog.c for good results >>> and to make sure policy is consistent. >> >> That's fine, will consider this. >> >> >>> 2- overhead. In a very common scenario when device is in hypervisor, >>> programming timers etc has a very high overhead, at bootup >>> lots of CVQ commands are run and slowing boot down is not nice. >>> let's poll for a bit before waiting? >> >> Then we go back to the question of choosing a good timeout for poll. And >> poll seems problematic in the case of UP, scheduler might not have the >> chance to run. > Poll just a bit :) Seriously I don't know, but at least check once > after kick.I think it is what the current code did where the condition will be check before trying to sleep in the wait_event().> >>> 3- suprise removal. need to wake up thread in some way. what about >>> other cases of device breakage - is there a chance this >>> introduces new bugs around that? at least enumerate them please. >> >> The current code did: >> >> 1) check for vq->broken >> 2) wakeup during BAD_RING() >> >> So we won't end up with a never woke up process which should be fine. >> >> Thanks > > BTW BAD_RING on removal will trigger dev_err. Not sure that is a good > idea - can cause crashes if kernel panics on error.Yes, it's better to use __virtqueue_break() instead. But consider we will start from a wait first, I will limit the changes in virtio-net without bothering virtio core. Thanks> >>>