Michael S. Tsirkin
2022-Dec-27 07:33 UTC
[PATCH 3/4] virtio_ring: introduce a per virtqueue waitqueue
On Tue, Dec 27, 2022 at 12:30:35PM +0800, Jason Wang wrote:> > But device is still going and will later use the buffers. > > > > Same for timeout really. > > Avoiding infinite wait/poll is one of the goals, another is to sleep. > If we think the timeout is hard, we can start from the wait. > > ThanksIf the goal is to avoid disrupting traffic while CVQ is in use, that sounds more reasonable. E.g. someone is turning on promisc, a spike in CPU usage might be unwelcome. things we should be careful to address then: 1- debugging. Currently it's easy to see a warning if CPU is stuck in a loop for a while, and we also get a backtrace. E.g. with this - how do we know who has the RTNL? We need to integrate with kernel/watchdog.c for good results and to make sure policy is consistent. 2- overhead. In a very common scenario when device is in hypervisor, programming timers etc has a very high overhead, at bootup lots of CVQ commands are run and slowing boot down is not nice. let's poll for a bit before waiting? 3- suprise removal. need to wake up thread in some way. what about other cases of device breakage - is there a chance this introduces new bugs around that? at least enumerate them please. -- MST
Jason Wang
2022-Dec-27 09:12 UTC
[PATCH 3/4] virtio_ring: introduce a per virtqueue waitqueue
? 2022/12/27 15:33, Michael S. Tsirkin ??:> On Tue, Dec 27, 2022 at 12:30:35PM +0800, Jason Wang wrote: >>> But device is still going and will later use the buffers. >>> >>> Same for timeout really. >> Avoiding infinite wait/poll is one of the goals, another is to sleep. >> If we think the timeout is hard, we can start from the wait. >> >> Thanks > If the goal is to avoid disrupting traffic while CVQ is in use, > that sounds more reasonable. E.g. someone is turning on promisc, > a spike in CPU usage might be unwelcome.Yes, this would be more obvious is UP is used.> > things we should be careful to address then: > 1- debugging. Currently it's easy to see a warning if CPU is stuck > in a loop for a while, and we also get a backtrace. > E.g. with this - how do we know who has the RTNL? > We need to integrate with kernel/watchdog.c for good results > and to make sure policy is consistent.That's fine, will consider this.> 2- overhead. In a very common scenario when device is in hypervisor, > programming timers etc has a very high overhead, at bootup > lots of CVQ commands are run and slowing boot down is not nice. > let's poll for a bit before waiting?Then we go back to the question of choosing a good timeout for poll. And poll seems problematic in the case of UP, scheduler might not have the chance to run.> 3- suprise removal. need to wake up thread in some way. what about > other cases of device breakage - is there a chance this > introduces new bugs around that? at least enumerate them please.The current code did: 1) check for vq->broken 2) wakeup during BAD_RING() So we won't end up with a never woke up process which should be fine. Thanks> >