On Mon, Jan 30, 2023 at 7:54 PM Eli Cohen <elic at nvidia.com>
wrote:>
>
> On 30/01/2023 13:34, Michael S. Tsirkin wrote:
> > On Mon, Jan 30, 2023 at 12:01:23PM +0200, Eli Cohen wrote:
> >> On 30/01/2023 10:19, Jason Wang wrote:
> >>> Hi Eli:
> >>>
> >>> On Mon, Jan 23, 2023 at 1:59 PM Eli Cohen <elic at
nvidia.com> wrote:
> >>>> VDPA allows hardware drivers the propagate interrupts from
the hardware
> >>>> directly to the vCPU used by the guest. In a typical
implementation, the
> >>>> hardware driver will assign the interrupt vectors to the
virtqueues and report
> >>>> this information back through the get_vq_irq() callback
defined in
> >>>> struct vdpa_config_ops.
> >>>>
> >>>> Interrupt vectors could be a scarce resource and may be
limited. For such
> >>>> cases, we can opt the administrator, through the vdpa
tool, to set the policy
> >>>> defining how to distribute the available vectors amongst
the data virtqueues.
> >>>>
> >>>> The following policies are proposed:
> >>>>
> >>>> 1. First comes first served. Assign a vector to each data
virtqueue by the
> >>>> virtqueue index. Virtqueues which could not be
assigned a dedicated vector
> >>>> would use the hardware driver to propagate
interrupts using the available
> >>>> callback mechanism.
> >>>>
> >>>> vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2
int=all
> >>>>
> >>>> This is the default mode and works even if
"int=all" was not specified.
> >>>>
> >>>> 2. Use round robin distribution so virtqueues could share
vectors.
> >>>> vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2
int=all intmode=share
> >>>>
> >>>> 3. Assign vectors to RX virtqueues only.
> >>>> 3.1 Do not share vectors
> >>>> vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2
int=rx
> >>>> 3.2 Share vectors
> >>>> vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2
int=rx intmode=share
> >>>>
> >>>> 4. Assign vectors to TX virtqueues only. Can share or not,
like rx.
> >>>> 5. Fail device creation if number of vectors cannot be
fulfilled.
> >>>> vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2
max_vq_pairs 8 int=rx intnum=8
> >>> I wonder:
> >>>
> >>> 1) how the administrator can know if there's sufficient
resources for
> >>> one of the above policies.
> >> There's no established way to know. The idea is to use
whatever there is
> >> assuming interrupt bypassing is always better then the callback
mechanism.
> >>> 2) how does the administrator know which policy is the best
assuming
> >>> the resources are sufficient? (E.g vectors to RX only or
vectors to TX
> >>> only)
> >> I don't think there's a rule of thumb here but he needs to
experiment what
> >> works best for him.
> >>> If it requires a vendor specific way or knowledge, I believe
it's
> >>> better to code them in:
> >>>
> >>> 1) the vDPA parent or
> >>> 2) underlayer management tool or drivers
> >>>
> >>> Thanks
> >> I was wondering also about the current mechanism we have. The
hardware
> >> driver reports irq number for each VQ.
> >>
> >> The guest driver sees a virtio pci device with MSIX vectors as the
number of
> >> virtqueues.
> >>
> >> Suppose the hardware driver provided only 5 interrupt vectors
while there
> >> are 16 VQs.
> >>
> >> Which MSIX vector at the guest gets really posted interrupt and
which one
> >> uses callback handled at the hardware driver?
> > Not sure I understand.
> > If you get a single interrupt from hardware callback or posted
> > you can only drive one interrupt to guest, no?
> >
> For every VQ I have a chance to assign interrupt vector.
>
> Consider this scenario:
>
> mlx5_vdpa created with 16 data virtqueu
>
> mlx5_vdpa associates VQ0 with interrupt vector. The reset of the vectors
> don't get assigned vectors and use old callback mechanism.
>
> When you go to the VM and run lspci, you will see the device has 16 MSIX
> vectors.
Note that the guest MSI-X vectors are emulated by software, you can
change by specificing "vectors=X" parameters of virtio-pci. And those
MSI-X vectors are backed by eventfds which Qemu will create and pass
to both KVM and vhost-vDPA.
>
> Do you know which of the MSIX vectors on the guest is the vector I
> assigned for VQ0?
The mapping from guest MSI-X vector to VQ0 is done via
queue_msix_vector in the capability, and it is under the control of
guest virtio-pci drivers.
The mapping from host MSI-X to guest MSI-X (required for the posted
interrupt) is done via matching the eventfd between KVM and vhost-vDPA
when assigning eventfds. So assuming:
1) guest driver use guest seen MSI-X vector X for vq0
2) host driver report irqX via get_vq_irq(0)
Then corresponding host MSI-X of irqX is mapped to vq0 (via guest seen
MSI-X vector X) via posted interrupt when it is possible. If the
posted interrupt can't work for some reasons, the code will fallback
to vq_callback which is a simple eventfd_signal().
Thanks
>
> >>>>
> >>>>
>