On 2021/2/18 11:15 ??, Gautam Dawar wrote:> Hi Jason,
>
> Thanks for your response.
>
> On Thu, 18 Feb 2021 at 14:18, Jason Wang <jasowang at redhat.com
> <mailto:jasowang at redhat.com>> wrote:
>
> Hi Gautam:
>
> On 2021/2/15 9:01 ??, Gautam Dawar wrote:
>>
>> Hi Jason/Michael,
>>
>> I observed a kernel panic while testing vhost-vdpa with Xilinx
>> adapters. Here are the details for your review:
>>
>> Problem statement:
>>
>> When qemu with vhost-vdpa netdevice is run for the first time, it
>> works well. But after the VM is powered off, next qemu run causes
>> kernel panic due to a NULL pointer dereference in
>> irq_bypass_register_producer().
>>
>> Root cause analysis:
>>
>> When the VM is powered off, vhost_dev_stop() is invoked which in
>> turn calls vhost_vdpa_reset_device() causing the irq_bypass
>> producers to be unregistered.
>>
>> On the next run, when qemu opens the vhost device, the
>> vhost_vdpa_open() file operation calls vhost_dev_init(). Here,
>> call_ctx->producer memory is cleared in
vhost_vring_call_reset().
>>
>> Further, when the virtqueues are initialized by
>> vhost_virtqueue_init(), vhost_vdpa_setup_vq_irq() again registers
>> the irq_bypass producer for each virtqueue. As the node member of
>> struct irq_bypass_producer is also initialized to zero, traversal
>> on the producers list causes crash due to NULL pointer dereference.
>>
>
> Thanks a lot for reporting this issue.
>
>
>> Fix details:
>>
>> I think that this issue can be fixed by invoking
>> vhost_vdpa_setup_vq_irq() only when vhost_vdpa_set_status()
>> includes VIRTIO_CONFIG_S_DRIVER_OK in the new status value. This
>> way, there won?t be any stale nodes in the irqbypass? module?s
>> producers list which are reset in vhost_vring_call_reset().
>>
>> Patch:
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index
>> 62a9bb0efc55..fdad94e2fbf9 100644
>>
>> --- a/drivers/vhost/vdpa.c
>>
>> +++ b/drivers/vhost/vdpa.c
>>
>> @@ -409,7 +409,6 @@ static long vhost_vdpa_vring_ioctl(struct
>> vhost_vdpa *v, unsigned int cmd,
>>
>> cb.private = NULL;
>>
>> }
>>
>> ops->set_vq_cb(vdpa, idx, &cb);
>>
>> - vhost_vdpa_setup_vq_irq(v, idx);
>>
>> break;
>>
>> case VHOST_SET_VRING_NUM:
>>
>> We can also track this issue in Bugzilla ticket 21171
>> (https://bugzilla.kernel.org/show_bug.cgi?id=211711
>> <https://bugzilla.kernel.org/show_bug.cgi?id=211711>) and the
>> complete patch is attached with this email.
>>
>
> So vhost supports to remove or switch eventfd through
> vhost_vdpa_vring_ioctl(). So if userspace want to switch to
> another eventfd, we should re-do the register and unregister.
>
> GD>>? This makes sense. I missed the use case where userspace may
want
> to switch to a different eventfd.
This can happen when interrupt needs to be disabled for some reason (e.g
MSI-X is masked).
>
> I think we need to deal this issue in another way. Can we check
> whether or not the producer is initialized before?
>
> Thanks
>
> GD>> Initialization path is fine but the actual problem lies in the
> clean-up part.
> I think the following check is the cause of this issue:
>
> static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
> if (vq->call_ctx.producer.irq)
> irq_bypass_unregister_producer(&vq->call_ctx.producer);
> The above if condition will prevent the de-initialization of the
> producer nodes corresponding to irq 0 but
> irq_bypass_unregister_producer() should be called for all valid irq
> values including zero.
>
> Accordingly, following patch is required to fix this issue:
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 62a9bb0efc55..d1c3a33c6239 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -849,7 +849,7 @@ static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
>
> ? ? ? ? for (i = 0; i < v->nvqs; i++) {
> ? ? ? ? ? ? ? ? vq = &v->vqs[i];
> - ? ? ? ? ? ? ? if (vq->call_ctx.producer.irq)
> + ? ? ? ? ? ? ? if (vq->call_ctx.producer.irq >= 0)
> irq_bypass_unregister_producer(&vq->call_ctx.producer);
> ? ? ? ? }
> ?}
It should work, please post a formal patch for this.
I will give more thought in the meanwhile since I spot some other
defects on codes for irqbyass usage in vdpa.
Thanks
>
> The revised patch (bug211711_fix.patch) is also attached with this email.
>
>> Regards,
>>
>> Gautam Dawar
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210222/b32ab4c3/attachment-0001.html>