Jason Wang
2020-Dec-02 05:51 UTC
[External] Re: [PATCH 0/7] Introduce vdpa management tool
On 2020/12/2 ??12:53, Parav Pandit wrote:> >> From: Yongji Xie <xieyongji at bytedance.com> >> Sent: Wednesday, December 2, 2020 9:00 AM >> >> On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav at nvidia.com> wrote: >>> >>> >>>> From: Yongji Xie <xieyongji at bytedance.com> >>>> Sent: Tuesday, December 1, 2020 7:49 PM >>>> >>>> On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav at nvidia.com> wrote: >>>>> >>>>> >>>>>> From: Yongji Xie <xieyongji at bytedance.com> >>>>>> Sent: Tuesday, December 1, 2020 3:26 PM >>>>>> >>>>>> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang at redhat.com> >>>> wrote: >>>>>>> >>>>>>> On 2020/11/30 ??3:07, Yongji Xie wrote: >>>>>>>>>> Thanks for adding me, Jason! >>>>>>>>>> >>>>>>>>>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in >>>>>>>>>> Userspace) [1]. This tool is very useful for the vduse device. >>>>>>>>>> So I'm considering integrating this into my v2 patchset. >>>>>>>>>> But there is one problem? >>>>>>>>>> >>>>>>>>>> In this tool, vdpa device config action and enable action >>>>>>>>>> are combined into one netlink msg: VDPA_CMD_DEV_NEW. But >>>>>>>>>> in >>>> vduse >>>>>>>>>> case, it needs to be splitted because a chardev should be >>>>>>>>>> created and opened by a userspace process before we enable >>>>>>>>>> the vdpa device (call vdpa_register_device()). >>>>>>>>>> >>>>>>>>>> So I'd like to know whether it's possible (or have some >>>>>>>>>> plans) to add two new netlink msgs something like: >>>>>>>>>> VDPA_CMD_DEV_ENABLE >>>>>> and >>>>>>>>>> VDPA_CMD_DEV_DISABLE to make the config path more flexible. >>>>>>>>>> >>>>>>>>> Actually, we've discussed such intermediate step in some >>>>>>>>> early discussion. It looks to me VDUSE could be one of the users of >> this. >>>>>>>>> Or I wonder whether we can switch to use anonymous >>>>>>>>> inode(fd) for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD >> ioctl? >>>>>>>> Yes, we can. Actually the current implementation in VDUSE is >>>>>>>> like this. But seems like this is still a intermediate step. >>>>>>>> The fd should be binded to a name or something else which >>>>>>>> need to be configured before. >>>>>>> >>>>>>> The name could be specified via the netlink. It looks to me >>>>>>> the real issue is that until the device is connected with a >>>>>>> userspace, it can't be used. So we also need to fail the >>>>>>> enabling if it doesn't >>>> opened. >>>>>> Yes, that's true. So you mean we can firstly try to fetch the fd >>>>>> binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use >>>>>> the name/vduse_id as a attribute to create vdpa device? It looks fine to >> me. >>>>> I probably do not well understand. I tried reading patch [1] and >>>>> few things >>>> do not look correct as below. >>>>> Creating the vdpa device on the bus device and destroying the >>>>> device from >>>> the workqueue seems unnecessary and racy. >>>>> It seems vduse driver needs >>>>> This is something should be done as part of the vdpa dev add >>>>> command, >>>> instead of connecting two sides separately and ensuring race free >>>> access to it. >>>>> So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided. >>>>> >>>> Yes, we can avoid these two ioctls with the help of the management tool. >>>> >>>>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2 >>>>> >>>>> When above command is executed it creates necessary vdpa device >>>>> foo2 >>>> on the bus. >>>>> When user binds foo2 device with the vduse driver, in the probe(), >>>>> it >>>> creates respective char device to access it from user space. >>>> >>> I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 or >> netdevsim. >>> It has its own implementation similar to fuse with its own backend of choice. >>> More below. >>> >>>> But vduse driver is not a vdpa bus driver. It works like vdpasim >>>> driver, but offloads the data plane and control plane to a user space process. >>> In that case to draw parallel lines, >>> >>> 1. netdevsim: >>> (a) create resources in kernel sw >>> (b) datapath simulates in kernel >>> >>> 2. ifc + mlx5 vdpa dev: >>> (a) creates resource in hw >>> (b) data path is in hw >>> >>> 3. vduse: >>> (a) creates resources in userspace sw >>> (b) data path is in user space. >>> hence creates data path resources for user space. >>> So char device is created, removed as result of vdpa device creation. >>> >>> For example, >>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2 >>> >>> Above command will create char device for user space. >>> >>> Similar command for ifc/mlx5 would have created similar channel for rest of >> the config commands in hw. >>> vduse channel = char device, eventfd etc. >>> ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim >>> channel = sw direct calls >>> >>> Does it make sense? >> In my understanding, to make vdpa work, we need a backend (datapath >> resources) and a frontend (a vdpa device attached to a vdpa bus). In the above >> example, it looks like we use the command "vdpa dev add ..." >> to create a backend, so do we need another command to create a frontend? >> > For block device there is certainly some backend to process the IOs. > Sometimes backend to be setup first, before its front end is exposed. > "vdpa dev add" is the front end command who connects to the backend (implicitly) for network device. > > vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel). > > And it needs a way to connect to backend when explicitly specified during creation time. > Something like, > $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle <uuid> > In above example some vendor device specific unique handle is passed based on backend setup in hardware/user space. > > In below 3 examples, vdpa block simulator is connecting to backend block or file. > > $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev /dev/zero > > $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev /dev/sda2 size=100M offset=10M > > $ vdpa dev add parentdev vdpa_block filebackend_sim type block name foo6 file /root/file_backend.txt > > Or may be backend connects to the created vdpa device is bound to the driver. > Can vduse attach to the created vdpa block device through the char device and establish the channel to receive IOs, and to setup the block config space?I think it can work. Another thing I wonder it that, do we consider more than one VDUSE parentdev(or management dev)? This allows us to have separated devices implemented via different processes. If yes, VDUSE ioctl needs to be extended to register/unregister parentdev. Thanks> >> Thanks, >> Yongji
Parav Pandit
2020-Dec-02 06:24 UTC
[External] Re: [PATCH 0/7] Introduce vdpa management tool
> From: Jason Wang <jasowang at redhat.com> > Sent: Wednesday, December 2, 2020 11:21 AM > > On 2020/12/2 ??12:53, Parav Pandit wrote: > > > >> From: Yongji Xie <xieyongji at bytedance.com> > >> Sent: Wednesday, December 2, 2020 9:00 AM > >> > >> On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav at nvidia.com> wrote: > >>> > >>> > >>>> From: Yongji Xie <xieyongji at bytedance.com> > >>>> Sent: Tuesday, December 1, 2020 7:49 PM > >>>> > >>>> On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav at nvidia.com> > wrote: > >>>>> > >>>>> > >>>>>> From: Yongji Xie <xieyongji at bytedance.com> > >>>>>> Sent: Tuesday, December 1, 2020 3:26 PM > >>>>>> > >>>>>> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang > <jasowang at redhat.com> > >>>> wrote: > >>>>>>> > >>>>>>> On 2020/11/30 ??3:07, Yongji Xie wrote: > >>>>>>>>>> Thanks for adding me, Jason! > >>>>>>>>>> > >>>>>>>>>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in > >>>>>>>>>> Userspace) [1]. This tool is very useful for the vduse device. > >>>>>>>>>> So I'm considering integrating this into my v2 patchset. > >>>>>>>>>> But there is one problem? > >>>>>>>>>> > >>>>>>>>>> In this tool, vdpa device config action and enable action are > >>>>>>>>>> combined into one netlink msg: VDPA_CMD_DEV_NEW. But in > >>>> vduse > >>>>>>>>>> case, it needs to be splitted because a chardev should be > >>>>>>>>>> created and opened by a userspace process before we enable > >>>>>>>>>> the vdpa device (call vdpa_register_device()). > >>>>>>>>>> > >>>>>>>>>> So I'd like to know whether it's possible (or have some > >>>>>>>>>> plans) to add two new netlink msgs something like: > >>>>>>>>>> VDPA_CMD_DEV_ENABLE > >>>>>> and > >>>>>>>>>> VDPA_CMD_DEV_DISABLE to make the config path more > flexible. > >>>>>>>>>> > >>>>>>>>> Actually, we've discussed such intermediate step in some early > >>>>>>>>> discussion. It looks to me VDUSE could be one of the users of > >> this. > >>>>>>>>> Or I wonder whether we can switch to use anonymous > >>>>>>>>> inode(fd) for VDUSE then fetching it via an > >>>>>>>>> VDUSE_GET_DEVICE_FD > >> ioctl? > >>>>>>>> Yes, we can. Actually the current implementation in VDUSE is > >>>>>>>> like this. But seems like this is still a intermediate step. > >>>>>>>> The fd should be binded to a name or something else which need > >>>>>>>> to be configured before. > >>>>>>> > >>>>>>> The name could be specified via the netlink. It looks to me the > >>>>>>> real issue is that until the device is connected with a > >>>>>>> userspace, it can't be used. So we also need to fail the > >>>>>>> enabling if it doesn't > >>>> opened. > >>>>>> Yes, that's true. So you mean we can firstly try to fetch the fd > >>>>>> binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then > use > >>>>>> the name/vduse_id as a attribute to create vdpa device? It looks > >>>>>> fine to > >> me. > >>>>> I probably do not well understand. I tried reading patch [1] and > >>>>> few things > >>>> do not look correct as below. > >>>>> Creating the vdpa device on the bus device and destroying the > >>>>> device from > >>>> the workqueue seems unnecessary and racy. > >>>>> It seems vduse driver needs > >>>>> This is something should be done as part of the vdpa dev add > >>>>> command, > >>>> instead of connecting two sides separately and ensuring race free > >>>> access to it. > >>>>> So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be > avoided. > >>>>> > >>>> Yes, we can avoid these two ioctls with the help of the management > tool. > >>>> > >>>>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2 > >>>>> > >>>>> When above command is executed it creates necessary vdpa device > >>>>> foo2 > >>>> on the bus. > >>>>> When user binds foo2 device with the vduse driver, in the probe(), > >>>>> it > >>>> creates respective char device to access it from user space. > >>>> > >>> I see. So vduse cannot work with any existing vdpa devices like ifc, > >>> mlx5 or > >> netdevsim. > >>> It has its own implementation similar to fuse with its own backend of > choice. > >>> More below. > >>> > >>>> But vduse driver is not a vdpa bus driver. It works like vdpasim > >>>> driver, but offloads the data plane and control plane to a user space > process. > >>> In that case to draw parallel lines, > >>> > >>> 1. netdevsim: > >>> (a) create resources in kernel sw > >>> (b) datapath simulates in kernel > >>> > >>> 2. ifc + mlx5 vdpa dev: > >>> (a) creates resource in hw > >>> (b) data path is in hw > >>> > >>> 3. vduse: > >>> (a) creates resources in userspace sw > >>> (b) data path is in user space. > >>> hence creates data path resources for user space. > >>> So char device is created, removed as result of vdpa device creation. > >>> > >>> For example, > >>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2 > >>> > >>> Above command will create char device for user space. > >>> > >>> Similar command for ifc/mlx5 would have created similar channel for > >>> rest of > >> the config commands in hw. > >>> vduse channel = char device, eventfd etc. > >>> ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim > >>> channel = sw direct calls > >>> > >>> Does it make sense? > >> In my understanding, to make vdpa work, we need a backend (datapath > >> resources) and a frontend (a vdpa device attached to a vdpa bus). In > >> the above example, it looks like we use the command "vdpa dev add ..." > >> to create a backend, so do we need another command to create a > frontend? > >> > > For block device there is certainly some backend to process the IOs. > > Sometimes backend to be setup first, before its front end is exposed. > > "vdpa dev add" is the front end command who connects to the backend > (implicitly) for network device. > > > > vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel). > > > > And it needs a way to connect to backend when explicitly specified during > creation time. > > Something like, > > $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle > <uuid> > > In above example some vendor device specific unique handle is passed > based on backend setup in hardware/user space. > > > > In below 3 examples, vdpa block simulator is connecting to backend block > or file. > > > > $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev > > /dev/zero > > > > $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev > > /dev/sda2 size=100M offset=10M > > > > $ vdpa dev add parentdev vdpa_block filebackend_sim type block name > > foo6 file /root/file_backend.txt > > > > Or may be backend connects to the created vdpa device is bound to the > driver. > > Can vduse attach to the created vdpa block device through the char device > and establish the channel to receive IOs, and to setup the block config space? > > > I think it can work. > > Another thing I wonder it that, do we consider more than one VDUSE > parentdev(or management dev)? This allows us to have separated devices > implemented via different processes.Multiple parentdev should be possible per one driver. for example mlx5_vdpa.ko will create multiple parent dev, one for each PCI VFs, SFs. vdpa dev add can certainly use one parent/mgmt dev to create multiple vdpa devices. Not sure why do we need to create multiple parent dev for that. I guess there is just one parent/mgmt. dev for VDUSE. What will each mgmtdev do differently? Demux of IOs, events will be per individual char dev level?> > If yes, VDUSE ioctl needs to be extended to register/unregister parentdev. > > Thanks > > > > > >> Thanks, > >> Yongji