thr3ads.net - Linux Virtualization - kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Steven Luong (sluong)

2018-Dec-13 23:24 UTC

kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

Folks,

We came across a memory race condition between VPP vhost driver and the kernel
vhost. VPP is running a tap interface over vhost backend. In this case, VPP is
acting as the vhost driver mode and the kernel vhost is acting as the vhost
device mode.

In the kernel vhost?s TX traffic direction which is VPP?s RX traffic direction,
kernel vhost is the producer and VPP is the consumer. Kernel vhost places
traffic in kernel vhost?s TX vring. VPP removes traffic in VPP?s RX vring. It is
inevitable that the vring may become full under heavy traffic and the kernel
vhost may have to stop and wait for the guest (VPP) to empty the vring and to
refill the vring with descriptors. When that case happens, kernel vhost clears
the bit in the vring?s used flag to request an interrupt/notification. Due to
shared memory race condition, VPP may miss the clearing of the vring?s used flag
from kernel vhost and didn?t send kernel vhost an interrupt after VPP empties
and refills the vring with new descriptors. Unfortunately, this missed
notification causes the kernel vhost to be stuck because once the kernel vhost
is waiting for an interrupt/notification from the guest, only an
interrupt/notification from the guest can resume/re-enable the guest from
submitting new packets to the vring. This design seems vulnerable. Should the
kernel vhost totally depend on the notification from the guest? Quoting the text
from

http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html

/* The device uses this in used->flags to advise the driver: don?t kick me
* when you add a buffer. It?s unreliable, so it?s simply an
* optimization. */
#define VIRTQ_USED_F_NO_NOTIFY 1

I interpret that the notification is simply an optimization, not a reliable
notification mechanism. So the kernel vhost should not bet the farm on it.

We encounter the same race condition in kernel vhost?s RX traffic direction
which causes the kernel vhost to be stuck due to missed interrupt/notification
although it is less frequent.

Steven
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20181213/fef548f0/attachment.html>

Michael S. Tsirkin

2018-Dec-17 22:55 UTC

head link

kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

On Thu, Dec 13, 2018 at 11:24:28PM +0000, Steven Luong (sluong) via
Virtualization wrote:> Folks,
> 
>  
> 
> We came across a memory race condition between VPP vhost driver and the
kernel
> vhost. VPP is running a tap interface over vhost backend. In this case, VPP
is
> acting as the vhost driver mode and the kernel vhost is acting as the vhost
> device mode.
> 
>  
> 
> In the kernel vhost?s TX traffic direction which is VPP?s RX traffic
direction,
> kernel vhost is the producer and VPP is the consumer.
Looking at vhost net kernel code, it seems to use the
same terminology as virtio net.
Can you pls clarify which ring number do you use 0 or 1?

I will assume ring 0 below.


> Kernel vhost places
> traffic in kernel vhost?s TX vring. VPP removes traffic in VPP?s RX vring.

So VPP makes buffers available and vhost kernel uses them?

> It
> is inevitable that the vring may become full under heavy traffic and the
kernel
> vhost may have to stop and wait for the guest (VPP) to empty the vring and
to
> refill the vring with descriptors. When that case happens, kernel vhost
clears
> the bit in the vring?s used flag to request an interrupt/notification. Due
to
> shared memory race condition, VPP may miss the clearing of the vring?s used
> flag from kernel vhost and didn?t send kernel vhost an interrupt after VPP
> empties and refills the vring with new descriptors.
So kernel vhost's wakeups are signalled through eventfd - I assume
this is what you mean by an interrupt?


To prevent the race that you describe, vhost has this code:

                /* OK, now we need to know about added descriptors. */
                if (!headcount) {
                        if (unlikely(busyloop_intr)) {
                                vhost_poll_queue(&vq->poll);
                        } else if
(unlikely(vhost_enable_notify(&net->dev, vq))) {
                                /* They have slipped one in as we were
                                 * doing that: check again. */
                                vhost_disable_notify(&net->dev, vq);
                                continue;
                        }
                        /* Nothing new?  Wait for eventfd to tell us

So ring should be re-checked after notifications are enabled.
If ring is no longer empty, vhost will proceed to handle the ring.  
This code has been around for many years.

Thus if VPP doesn't work with it then I suspect it does something
differently, e.g. is it possible that it is missing a memory
barrier after writing out the available buffers and before
checking the flags?



> Unfortunately, this missed
> notification causes the kernel vhost to be stuck because once the kernel
vhost
> is waiting for an interrupt/notification from the guest, only an interrupt/
> notification from the guest can resume/re-enable the guest from submitting
new
> packets to the vring. This design seems vulnerable.
The protocol right now relies on notifications never being lost.

I can imagine an alternative where device also re-checks
the rings periodically, but that would involve running
timers host side which is only possible if there's a
free processor that can handle them. Further,
it will still lead to stalls and packet drops, which will
be harder to debug.

That's just my $.02, pls feel free to take it with the virtio tc
maybe others will feel differently.
> Should the kernel vhost
> totally depend on the notification from the guest? Quoting the text from
> 
>  
> 
> http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html
> 
>  
> 
> /* The device uses this in used->flags to advise the driver: don?t kick
me
>  * when you add a buffer.  It?s unreliable, so it?s simply an 
>  * optimization. */ 
> #define VIRTQ_USED_F_NO_NOTIFY  1 
> 
>  
> 
> I interpret that the notification is simply an optimization, not a reliable
> notification mechanism. 
What was meant I think is that suppression is unreliable.
>So the kernel vhost should not bet the farm on it.
> 
>  
> 
> We encounter the same race condition in kernel vhost?s RX traffic direction
> which causes the kernel vhost to be stuck due to missed
interrupt/notification
> although it is less frequent.

For the reverse direction the spec actually has some pseudo-code and a
suggestion
about handling that race:

	For optimal performance, a driver MAY disable used buffer notifications while
processing the used
	ring, but beware the problem of missing notifications between emptying the ring
and reenabling no-
	tifications. This is usually handled by re-checking for more used buffers after
notifications are re-
	enabled:

		virtq_disable_used_buffer_notifications(vq);
		for (;;) {
			if (vq->last_seen_used != le16_to_cpu(virtq->used.idx)) {
			virtq_enable_used_buffer_notifications(vq);
			mb();
			if (vq->last_seen_used != le16_to_cpu(virtq->used.idx))
				break;
			virtq_disable_used_buffer_notifications(vq);
		}
		struct virtq_used_elem *e = virtq.used->ring[vq->last_seen_used%vsz];
		process_buffer(e);
		vq->last_seen_used++;


>  
> 
> Steven
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Steven Luong (sluong)

2018-Dec-17 23:56 UTC

head link

kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

?On 12/17/18, 2:55 PM, "Michael S. Tsirkin" <mst at redhat.com>
wrote:

    On Thu, Dec 13, 2018 at 11:24:28PM +0000, Steven Luong (sluong) via
Virtualization wrote:
    > Folks,
    > 
    >  
    > 
    > We came across a memory race condition between VPP vhost driver and the
kernel
    > vhost. VPP is running a tap interface over vhost backend. In this case,
VPP is
    > acting as the vhost driver mode and the kernel vhost is acting as the
vhost
    > device mode.
    > 
    >  
    > 
    > In the kernel vhost?s TX traffic direction which is VPP?s RX traffic
direction,
    > kernel vhost is the producer and VPP is the consumer.
    
    Looking at vhost net kernel code, it seems to use the
    same terminology as virtio net.
    Can you pls clarify which ring number do you use 0 or 1?

<SVL> 0. </SVL>
    
    I will assume ring 0 below.
    
    
    
    > Kernel vhost places
    > traffic in kernel vhost?s TX vring. VPP removes traffic in VPP?s RX
vring.
    
    
    So VPP makes buffers available and vhost kernel uses them?
    
<SVL> Correct. </SVL>
    
    > It
    > is inevitable that the vring may become full under heavy traffic and
the kernel
    > vhost may have to stop and wait for the guest (VPP) to empty the vring
and to
    > refill the vring with descriptors. When that case happens, kernel vhost
clears
    > the bit in the vring?s used flag to request an interrupt/notification.
Due to
    > shared memory race condition, VPP may miss the clearing of the vring?s
used
    > flag from kernel vhost and didn?t send kernel vhost an interrupt after
VPP
    > empties and refills the vring with new descriptors.
    
    So kernel vhost's wakeups are signalled through eventfd - I assume
    this is what you mean by an interrupt?
    
    
    To prevent the race that you describe, vhost has this code:
    
                    /* OK, now we need to know about added descriptors. */
                    if (!headcount) {
                            if (unlikely(busyloop_intr)) {
                                    vhost_poll_queue(&vq->poll);
                            } else if
(unlikely(vhost_enable_notify(&net->dev, vq))) {
                                    /* They have slipped one in as we were
                                     * doing that: check again. */
                                    vhost_disable_notify(&net->dev, vq);
                                    continue;
                            }
                            /* Nothing new?  Wait for eventfd to tell us
    
    So ring should be re-checked after notifications are enabled.
    If ring is no longer empty, vhost will proceed to handle the ring.  
    This code has been around for many years.
    
    Thus if VPP doesn't work with it then I suspect it does something
    differently, e.g. is it possible that it is missing a memory
    barrier after writing out the available buffers and before
    checking the flags?
    
    <SVL> Can the race condition be avoided totally? Here is the scenario.

t0: VPP refills the vring with new descriptors, not yet sets avail->idx to
make it available to kernel vhost.
t1: kernel vhost checks the vring, still sees no space available, but does not
yet execute the line of code to clear the used->flags
t2: vpp executes sfence, and sets avail->idx to make it available to kernel
vhost
t3: vpp checks used->flags, it is still 1, vpp skips sending the interrupt
t4: kernel vhost clears used->flags to indicate interrupt is required. But
VPP just missed it. The result is kernel vhost got stuck even though the vring
is filled with new descriptors.

Steven

</SVL>
    
    
    > Unfortunately, this missed
    > notification causes the kernel vhost to be stuck because once the
kernel vhost
    > is waiting for an interrupt/notification from the guest, only an
interrupt/
    > notification from the guest can resume/re-enable the guest from
submitting new
    > packets to the vring. This design seems vulnerable.
    
    The protocol right now relies on notifications never being lost.
    
    I can imagine an alternative where device also re-checks
    the rings periodically, but that would involve running
    timers host side which is only possible if there's a
    free processor that can handle them. Further,
    it will still lead to stalls and packet drops, which will
    be harder to debug.
    
    That's just my $.02, pls feel free to take it with the virtio tc
    maybe others will feel differently.
    
    > Should the kernel vhost
    > totally depend on the notification from the guest? Quoting the text
from
    > 
    >  
    > 
    > http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html
    > 
    >  
    > 
    > /* The device uses this in used->flags to advise the driver: don?t
kick me
    >  * when you add a buffer.  It?s unreliable, so it?s simply an 
    >  * optimization. */ 
    > #define VIRTQ_USED_F_NO_NOTIFY  1 
    > 
    >  
    > 
    > I interpret that the notification is simply an optimization, not a
reliable
    > notification mechanism. 
    
    What was meant I think is that suppression is unreliable.
    
    >So the kernel vhost should not bet the farm on it.
    > 
    >  
    > 
    > We encounter the same race condition in kernel vhost?s RX traffic
direction
    > which causes the kernel vhost to be stuck due to missed
interrupt/notification
    > although it is less frequent.
    
    
    For the reverse direction the spec actually has some pseudo-code and a
suggestion
    about handling that race:
    
    	For optimal performance, a driver MAY disable used buffer notifications
while processing the used
    	ring, but beware the problem of missing notifications between emptying the
ring and reenabling no-
    	tifications. This is usually handled by re-checking for more used buffers
after notifications are re-
    	enabled:
    
    		virtq_disable_used_buffer_notifications(vq);
    		for (;;) {
    			if (vq->last_seen_used != le16_to_cpu(virtq->used.idx)) {
    			virtq_enable_used_buffer_notifications(vq);
    			mb();
    			if (vq->last_seen_used != le16_to_cpu(virtq->used.idx))
    				break;
    			virtq_disable_used_buffer_notifications(vq);
    		}
    		struct virtq_used_elem *e =
virtq.used->ring[vq->last_seen_used%vsz];
    		process_buffer(e);
    		vq->last_seen_used++;
    
    
    
    >  
    > 
    > Steven
    > 
    
    > _______________________________________________
    > Virtualization mailing list
    > Virtualization at lists.linux-foundation.org
    > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reasonably Related Threads

Search for more seemingly similar threads

Linux Virtualization - Dec 2018 - kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

Reasonably Related Threads