Cornelia Huck
2022-Apr-27  09:27 UTC
[PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()
On Tue, Apr 26 2022, "Michael S. Tsirkin" <mst at redhat.com> wrote:> On Tue, Apr 26, 2022 at 05:47:17PM +0200, Cornelia Huck wrote: >> On Mon, Apr 25 2022, "Michael S. Tsirkin" <mst at redhat.com> wrote: >> >> > On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote: >> >> On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote: >> >> > >> >> > ? 2022/4/26 11:38, Michael S. Tsirkin ??: >> >> > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin wrote: >> >> > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote: >> >> > > > > On Mon, 25 Apr 2022 09:59:55 -0400 >> >> > > > > "Michael S. Tsirkin" <mst at redhat.com> wrote: >> >> > > > > >> >> > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote: >> >> > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin" <mst at redhat.com> wrote: >> >> > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote: >> >> > > > > > > > > This patch tries to implement the synchronize_cbs() for ccw. For the >> >> > > > > > > > > vring_interrupt() that is called via virtio_airq_handler(), the >> >> > > > > > > > > synchronization is simply done via the airq_info's lock. For the >> >> > > > > > > > > vring_interrupt() that is called via virtio_ccw_int_handler(), a per >> >> > > > > > > > > device spinlock for irq is introduced ans used in the synchronization >> >> > > > > > > > > method. >> >> > > > > > > > > >> >> > > > > > > > > Cc: Thomas Gleixner <tglx at linutronix.de> >> >> > > > > > > > > Cc: Peter Zijlstra <peterz at infradead.org> >> >> > > > > > > > > Cc: "Paul E. McKenney" <paulmck at kernel.org> >> >> > > > > > > > > Cc: Marc Zyngier <maz at kernel.org> >> >> > > > > > > > > Cc: Halil Pasic <pasic at linux.ibm.com> >> >> > > > > > > > > Cc: Cornelia Huck <cohuck at redhat.com> >> >> > > > > > > > > Signed-off-by: Jason Wang <jasowang at redhat.com> >> >> > > > > > > > >> >> > > > > > > > This is the only one that is giving me pause. Halil, Cornelia, >> >> > > > > > > > should we be concerned about the performance impact here? >> >> > > > > > > > Any chance it can be tested? >> >> > > > > > > We can have a bunch of devices using the same airq structure, and the >> >> > > > > > > sync cb creates a choke point, same as registering/unregistering. >> >> > > > > > BTW can callbacks for multiple VQs run on multiple CPUs at the moment? >> >> > > > > I'm not sure I understand the question. >> >> > > > > >> >> > > > > I do think we can have multiple CPUs that are executing some portion of >> >> > > > > virtio_ccw_int_handler(). So I guess the answer is yes. Connie what do you think? >> >> > > > > >> >> > > > > On the other hand we could also end up serializing synchronize_cbs() >> >> > > > > calls for different devices if they happen to use the same airq_info. But >> >> > > > > this probably was not your question >> >> > > > >> >> > > > I am less concerned about synchronize_cbs being slow and more about >> >> > > > the slowdown in interrupt processing itself. >> >> > > > >> >> > > > > > this patch serializes them on a spinlock. >> >> > > > > > >> >> > > > > Those could then pile up on the newly introduced spinlock. >> >> How bad would that be in practice? IIUC, we hit on the spinlock when >> - doing synchronize_cbs (should be rare) >> - processing queue interrupts for devices using per-device indicators >> (which is the non-preferred path, which I would basically only expect >> when running on an ancient or non-standard hypervisor) > > this one is my concern. I am worried serializing everything on a single lock > will drastically regress performance here.Yeah, that case could get much worse. OTOH, how likely is it that any setup that runs a recent kernel will actually end up with devices using per-device indicators? Anything running under a QEMU released in the last couple of years is unlikely to not use airqs, I think. Halil, do you think that the classic indicator setup would be more common on any non-QEMU hypervisors? IOW, how much effort is it worth spending on optimizing this case? We certainly should explore any simple solutions, but I don't think we need to twist ourselves into pretzels to solve it.> > >> - configuration change interrupts (should be rare) >> - during setup, reset, etc. (should not be a concern)
On Wed, 27 Apr 2022 11:27:03 +0200 Cornelia Huck <cohuck at redhat.com> wrote:> On Tue, Apr 26 2022, "Michael S. Tsirkin" <mst at redhat.com> wrote: > > > On Tue, Apr 26, 2022 at 05:47:17PM +0200, Cornelia Huck wrote: > >> On Mon, Apr 25 2022, "Michael S. Tsirkin" <mst at redhat.com> wrote: > >> > >> > On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote: > >> >> On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote: > >> >> > > >> >> > ? 2022/4/26 11:38, Michael S. Tsirkin ??: > >> >> > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin wrote: > >> >> > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote: > >> >> > > > > On Mon, 25 Apr 2022 09:59:55 -0400 > >> >> > > > > "Michael S. Tsirkin" <mst at redhat.com> wrote: > >> >> > > > > > >> >> > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote: > >> >> > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin" <mst at redhat.com> wrote: > >> >> > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote: > >> >> > > > > > > > > This patch tries to implement the synchronize_cbs() for ccw. For the > >> >> > > > > > > > > vring_interrupt() that is called via virtio_airq_handler(), the > >> >> > > > > > > > > synchronization is simply done via the airq_info's lock. For the > >> >> > > > > > > > > vring_interrupt() that is called via virtio_ccw_int_handler(), a per > >> >> > > > > > > > > device spinlock for irq is introduced ans used in the synchronization > >> >> > > > > > > > > method. > >> >> > > > > > > > > > >> >> > > > > > > > > Cc: Thomas Gleixner <tglx at linutronix.de> > >> >> > > > > > > > > Cc: Peter Zijlstra <peterz at infradead.org> > >> >> > > > > > > > > Cc: "Paul E. McKenney" <paulmck at kernel.org> > >> >> > > > > > > > > Cc: Marc Zyngier <maz at kernel.org> > >> >> > > > > > > > > Cc: Halil Pasic <pasic at linux.ibm.com> > >> >> > > > > > > > > Cc: Cornelia Huck <cohuck at redhat.com> > >> >> > > > > > > > > Signed-off-by: Jason Wang <jasowang at redhat.com> > >> >> > > > > > > > > >> >> > > > > > > > This is the only one that is giving me pause. Halil, Cornelia, > >> >> > > > > > > > should we be concerned about the performance impact here? > >> >> > > > > > > > Any chance it can be tested? > >> >> > > > > > > We can have a bunch of devices using the same airq structure, and the > >> >> > > > > > > sync cb creates a choke point, same as registering/unregistering. > >> >> > > > > > BTW can callbacks for multiple VQs run on multiple CPUs at the moment? > >> >> > > > > I'm not sure I understand the question. > >> >> > > > > > >> >> > > > > I do think we can have multiple CPUs that are executing some portion of > >> >> > > > > virtio_ccw_int_handler(). So I guess the answer is yes. Connie what do you think? > >> >> > > > > > >> >> > > > > On the other hand we could also end up serializing synchronize_cbs() > >> >> > > > > calls for different devices if they happen to use the same airq_info. But > >> >> > > > > this probably was not your question > >> >> > > > > >> >> > > > I am less concerned about synchronize_cbs being slow and more about > >> >> > > > the slowdown in interrupt processing itself. > >> >> > > > > >> >> > > > > > this patch serializes them on a spinlock. > >> >> > > > > > > >> >> > > > > Those could then pile up on the newly introduced spinlock. > >> > >> How bad would that be in practice? IIUC, we hit on the spinlock when > >> - doing synchronize_cbs (should be rare) > >> - processing queue interrupts for devices using per-device indicators > >> (which is the non-preferred path, which I would basically only expect > >> when running on an ancient or non-standard hypervisor) > > > > this one is my concern. I am worried serializing everything on a single lock > > will drastically regress performance here. > > Yeah, that case could get much worse. OTOH, how likely is it that any > setup that runs a recent kernel will actually end up with devices using > per-device indicators? Anything running under a QEMU released in the > last couple of years is unlikely to not use airqs, I think. Halil, do > you think that the classic indicator setup would be more common on any > non-QEMU hypervisors? >I really don't know. My opinion is that, two stages indicators are kind of recommended for anybody who cares about notifications performance.> IOW, how much effort is it worth spending on optimizing this case? We > certainly should explore any simple solutions, but I don't think we need > to twist ourselves into pretzels to solve it. >Frankly, I would be fine with an rwlock based solution as proposed by Jason. My rationale is: we recommend two stage indicators, and the two stage indicators are already encumbered by an rwlock on the interrupt path. Yes, the coalescence of adapter interrupts is architecturally different, and so it is with GISA (without GISA, I'm not even sure), so this rwlock end up being worse than the one for 2 stage. But my feeling is, that it should be fine. On the other hand, I don't feel comfortable with plain spinlock, and I am curious about a more advanced solution. But my guess is that rwlock + some testing for the legacy indicator case just to double check if there is a heavy regression despite of our expectations to see none should do the trick. Regards, Halil> > > > > >> - configuration change interrupts (should be rare) > >> - during setup, reset, etc. (should not be a concern) >