? 2021/4/29 ??4:53, Si-Wei Liu ??:> On Sun, Apr 25, 2021 at 7:38 PM Jason Wang <jasowang at redhat.com>
wrote:
>>
>> ? 2021/4/25 ??9:25, Eli Cohen ??:
>>> On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
>>>> ? 2021/4/22 ??4:39, Eli Cohen ??:
>>>>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>>>>>> ? 2021/4/22 ??4:07, Eli Cohen ??:
>>>>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen
wrote:
>>>>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason
Wang wrote:
>>>>>>>>> ? 2021/4/21 ??6:41, Eli Cohen ??:
>>>>>>>>>> Implement mlx5_get_vq_notification() to
return the doorbell address.
>>>>>>>>>> Size is set to one system page as
required.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eli Cohen <elic at
nvidia.com>
>>>>>>>>>> ---
>>>>>>>>>>
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>>>>>>>
drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>>>>>>> drivers/vdpa/mlx5/net/mlx5_vnet.c
| 6 ++++++
>>>>>>>>>> 3 files changed, 8 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git
a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>>>>>> ---
a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> +++
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> @@ -41,6 +41,7 @@ struct
mlx5_vdpa_resources {
>>>>>>>>>> u32 pdn;
>>>>>>>>>> struct mlx5_uars_page *uar;
>>>>>>>>>> void __iomem *kick_addr;
>>>>>>>>>> + u64 phys_kick_addr;
>>>>>>>>>> u16 uid;
>>>>>>>>>> u32 null_mkey;
>>>>>>>>>> bool valid;
>>>>>>>>>> diff --git
a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>>>>>> ---
a/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> +++
b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> @@ -247,6 +247,7 @@ int
mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>>>>>>>>>> goto err_key;
>>>>>>>>>> kick_addr = mdev->bar_addr +
offset;
>>>>>>>>>> + res->phys_kick_addr =
kick_addr;
>>>>>>>>>> res->kick_addr =
ioremap(kick_addr, PAGE_SIZE);
>>>>>>>>>> if (!res->kick_addr) {
>>>>>>>>>> diff --git
a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> @@ -1865,8 +1865,14 @@ static void
mlx5_vdpa_free(struct vdpa_device *vdev)
>>>>>>>>>> static struct
vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16
idx)
>>>>>>>>>> {
>>>>>>>>>> + struct mlx5_vdpa_dev *mvdev =
to_mvdev(vdev);
>>>>>>>>>> struct vdpa_notification_area
ret = {};
>>>>>>>>>> + struct mlx5_vdpa_net *ndev;
>>>>>>>>>> +
>>>>>>>>>> + ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>>> + ret.addr =
(phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>>>>>> + ret.size = PAGE_SIZE;
>>>>>>>>> Note that the page will be mapped in to
guest, so it's only safe if the
>>>>>>>>> doorbeel exclusively own the page. This
means if there're other registers in
>>>>>>>>> the page, we can not let the doorbell
bypass to work.
>>>>>>>>>
>>>>>>>>> So this is suspicious at least in the case
of subfunction where we calculate
>>>>>>>>> the bar length in
mlx5_sf_dev_table_create() as:
>>>>>>>>>
>>>>>>>>> table->sf_bar_length = 1 <<
(MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
>>>>>>>>>
>>>>>>>>> It looks to me this can only work for the
arch with PAGE_SIZE = 4096,
>>>>>>>>> otherwise we can map more into the
userspace(guest).
>>>>>>>>>
>>>>>>>> Correct, so I guess I should return here 4096.
>>>>>> I'm not quite sure but since the calculation of the
sf_bar_length is doen
>>>>>> via a shift of 12, it might be correct.
>>>>>>
>>>>>> And please double check if the doorbell own the page
exclusively.
>>>>> I am checking if it is safe to map the any part of the
SF's BAR to
>>>>> userspace without harming other functions. If this is true,
I will check
>>>>> if I can return PAGE_SIZE without compromising security.
>>>> It's usally not safe and a layer violation if other
registers are placed at
>>>> the same page.
>>>>
>>>>
>>>>> I think we may
>>>>> need to extend struct vdpa_notification_area to contain
another field
>>>>> offset which indicates the offset from addr where the
actual doorbell
>>>>> resides.
>>>> The movitiaton of the current design is to be fit seamless into
how Qemu
>>>> model doorbell layouts currently:
>>>>
>>>> 1) page-per-vq, each vq has its own page aligned doorbell
>>>> 2) 2 bytes doorbell, each vq has its own 2 byte aligend
doorbell
>>>>
>>>> Only 1) is support in vhost-vDPA (and vhost-user) since
it's rather simple
>>>> and secure (page aligned) to be modelled and implemented via
mmap().
>>>>
>>>> Exporting a complex layout is possbile but requires careful
design.
>>>>
>>>> Actually, we had antoher option
>>>>
>>>> 3) shared doorbell: all virtqueue shares a single page aligned
doorbell
>>>>
>>> This nearly matches we have in ConnectX devices. All the doorbells
are
>>> located at the same place. For 4K page size atchitectures it is
aligned
>>> to the start of the page. For larger page sizes it is not aligned.
>>> If we don't allow to some offset within the page, it means that
direct
>>> doorbells will not work for 64K page size archs over ConnectX.
>>
>> Right, just to clarify. This can still be model by the current
>> page-per-vq model. It means the doorbell will be mapped into different
>> pages for each virtqueue by Qemu. So from the view of Qemu or guest,
>> each virtqueue has its own doorbell in this case.
> So this is the proposed model for mlx5 vdpa with doorbell per-instance
> (rather than per-vq), assuming the exclusive ownership of mapped
> doorbell page?
So let me try to explain here:
Let's assume the mlx5 vDPA has a exclusive page which is used for
doorbell for all the virtqueues.
This fits for the page-per-vq model of Qemu:
1) The virtio-pci devices emulated by Qemu provides a per virtqueue
doorbell (one page per vq)
2) Qemu use mmap() for map the doorbell separately
3) The vhost-vDPA prepare the page, and in this case, it always provide
the same page for userspace/Qemu
4) So vhost-vDPA actually map the single physical page into multiple
virtual pages and use those virtual pages for per-vq-doorbell with the
help of KVM
5) The device can still distinguish the virtqueue index since it is
required as the value wrote to the doorbell
So it should work like a charm. From qemu point of view, it's
page-per-vq, but at the physical device level, it's the shared doorbell.
>
>>
>>>> This is not yet supported by Qemu.
>>
>> For "not supported" I meant present this (doorbells sharing)
layout to
>> guest.
> So it means this new layout perhaps will have to introduce new virtio
> features to guest thus not compatible with eixsting driver?
Consider page-per-vq model works, I don't think we need to introduce a
userspace noticeable shared doorbell model. The details were hidden
perfectly with the help of vhost-vDPA and it provides extra flexibility.
The mapping is done per vq via:
mmap(NULL, vq_index << PAGE_SIZE);
Userspace can decide to:
1) only map part of the virtqueue, this is very important for the case
when Qemu want to intercept the virtqueue transactions when we want to
implement control vq or shadow vq
2) devices can selectively provide doorbell mapping based on the vq index
Exporting other layout is possible but it will require a careful design
of the uAPI and will complicate the management stuff (migration
compatibility especially).
Thanks
>
> -Siwei
>> Thanks
>>
>>
>>>> Thanks
>>>>
>>>>
>>>>>>>> I also think that the check in
vhost_vdpa_mmap() should verify that the
>>>>>>>> returned size is not smaller than PAGE_SIZE
because the returned address
>>>>>>> Actually I think it's ok since you verify the
size equals vma->vm_end -
>>>>>>> vma->vm_start which must be at least PAGE_SIZE.
>>>>>> Yes.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>> might just be aligned to PAGE_SIZE. I think
this should be enoght but
>>>>>>>> maybe also use the same logic in
vhost_vdpa_fault().