thr3ads.net - Linux Virtualization - [RFC v5 0/5] Add virtio transport for AF

If this information is useful, please help other people find it:
Share via:

Stefan Hajnoczi

2016-Apr-12 13:59 UTC

[RFC v5 0/5] Add virtio transport for AF_VSOCK

On Mon, Apr 11, 2016 at 03:54:08PM +0300, Michael S. Tsirkin
wrote:> On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote:
> > On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote:
> > > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote:
> > > > This series is based on Michael Tsirkin's vhost branch
(v4.5-rc6).
> > > > 
> > > > I'm about to process Claudio Imbrenda's locking
fixes for virtio-vsock but
> > > > first I want to share the latest version of the
code.??Several people are
> > > > playing with vsock now so sharing the latest code should
avoid duplicate work.
> > > 
> > > Thanks for this, I've been using it in my project and it
mostly seems
> > > fine.
> > > 
> > > One wrinkle I came across, which I'm not sure if it is by
design or a
> > > problem is that I can see this sequence coming from the guest
(with
> > > other activity in between):
> > > 
> > > ? ? 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> > > ????2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> > > ????3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
How did you trigger this sequence?  I'd like to reproduce it.
> > > I orignally had my backend close things down at #2, however this
meant
> > > that when #3 arrived it was for a non-existent socket (or, worse,
an
> > > active one if the ports got reused). I checked v5 of the spec
> > > proposal[0] which says:
> > >     If these bits are set and there are no more virtqueue buffers
> > >     pending the socket is disconnected.
> > > 
> > > but I'm not entirely sure if this behaviour contradicts this
or not
> > > (the bits have both been set at #2, but not at the same time).
> > > 
> > > BTW, how does one tell if there are no more virtqueue buffers
pending
> > > or not while processing the op?
> > 
> > #2 is odd.  The shutdown bits are sticky so they cannot be cleared
once
> > set.  I would have expected just #1 and #3.  The behavior you observe
> > look like a bug.
> > 
> > The spec text does not convey the meaning of OP_SHUTDOWN well.
> > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible
> > for this connection.  "there are no more virtqueue buffers
pending the
> > socket" really means that this isn't an immediate close from
the
> > perspective of the application.  If the application still has unread
rx
> > buffers then the socket stays readable until the rx data has been
fully
> > read.
> 
> Yes but you also wrote:
> 	If these bits are set and there are no more virtqueue buffers
> 	pending the socket is disconnected.
> 
> how does remote know that there are no buffers pending and so it's safe
> to reuse the same source/destination address now?  Maybe destination
> should send RST at that point?
You are right, the source/destination address could be reused while the
remote still has the connection in their table.  Connection
establishment would fail with a RST reply.

I can think of two solutions:

1. Implementations must remove connections from their table as soon as
   SHUTDOWN_TX|SHUTDOWN_RX is received.  This way the source/destination
   address tuple can be reused immediately, i.e. new connections with
   the same source/destination would be possible while an application is
   still draining the receive buffers of an old connection.

2. Extend the connection lifecycle so that an A->B
   SHUTDOWN_TX|SHUTDOWN_RX must be followed by a by a B->A RST to close
   a connection.  This way the source/destination address is only in use
   once at a time.

Option #2 seems safer because there is no overlap in source/destination
address usage.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160412/70774bc7/attachment.sig>

Ian Campbell

2016-Apr-12 16:37 UTC

head link

[RFC v5 0/5] Add virtio transport for AF_VSOCK

For some reason your mails in this thread only appear in the gmail web UI
and not on the IMAP version of my mailbox (my own and Michael's mails are
fine).

So I'm replying via the web interface, sorry for the inevitable formatting
mess :-/

I've CCd another mailbox in the hopes of getting your mails in that IMAP
folder instead/aswell so I can avoid this next time.

On 12 April 2016 at 14:59, Stefan Hajnoczi <stefanha at redhat.com>
wrote:>
> > > > One wrinkle I came across, which I'm not sure if it is
by design or a
> > > > problem is that I can see this sequence coming from the
guest (with
> > > > other activity in between):
> > > >
> > > >     1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> > > >     2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> > > >     3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
>
> How did you trigger this sequence?  I'd like to reproduce it.
>
Nothing magic. I've written some logging into my backend and captured the
result for a simple backend initiated connection.

In the log "TX" and "RX" indicate the thread doing the
processing (with
"TX" being the one which processes the guest's TX ring, i.e. data
coming
from the guest to the host). "<=" indicates a buffer going from
guest to
host and "=>" is from host to guest. NB that guest to host replies
are
queued synchronously by the TX thread onto the RX ring which is why the
somewhat odd looking "TX: =>" combination can occur. A host
initiated
connection also happens from the TX thread in the same way.

The trace is of a simple request response (which both fit in one buffer in
each direction), the lines without an "?X:" prefix are my
annotations/guesses as to what is going on:

TX: =>SRC:00000002.00010002 DST:00000003.00000948
TX:   LEN:00000000 TYPE:0001 OP:1=REQUEST
TX:  FLAGS:00000000 BUF_ALLOC:00008000 FWD_CNT:00000000

TX: <=SRC:00000003.00000948 DST:00000002.00010002
TX:   LEN:00000000 TYPE:0001 OP:2=RESPONSE
TX:  FLAGS:00000000 BUF_ALLOC:00040000 FWD_CNT:00000000

REQUEST + RESPONSE == Channel open successfully

RX: =>SRC:00000002.00010002 DST:00000003.00000948
RX:   LEN:0000005e TYPE:0001 OP:5=RW
RX:  FLAGS:00000000 BUF_ALLOC:00008000 FWD_CNT:00000000

Host sends a request to the guest

TX: <=SRC:00000003.00000948 DST:00000002.00010002
TX:   LEN:00000000 TYPE:0001 OP:6=CREDIT_UPDATE
TX:  FLAGS:00000000 BUF_ALLOC:00040000 FWD_CNT:0000005e

Guest replies with a credit update

TX: <=SRC:00000003.00000948 DST:00000002.00010002
TX:   LEN:00000091 TYPE:0001 OP:5=RW
TX:  FLAGS:00000000 BUF_ALLOC:00040000 FWD_CNT:0000005e

Guest replies with the answer to the request

RX: =>SRC:00000002.00010002 DST:00000003.00000948
RX:   LEN:00000000 TYPE:0001 OP:4=SHUTDOWN
RX:  FLAGS:00000002 BUF_ALLOC:00008000 FWD_CNT:00000091

Host has sent its only request, so host app must have done
shutdown(SHUT_WR) I suppose and host therefore sends SHUTDOWN_TX.

TX: <=SRC:00000003.00000948 DST:00000002.00010002
TX:   LEN:00000000 TYPE:0001 OP:4=SHUTDOWN
TX:  FLAGS:00000001 BUF_ALLOC:00040000 FWD_CNT:0000005e

Guest SHUTDOWN_RX. I'm not sure if this is a direct kernel response to the
SHUTDOWN_TX or if the application inside the guest saw an EOF when reading
the socket and did the corresponding shutdown(SHUT_RD).

TX: <=SRC:00000003.00000948 DST:00000002.00010002
TX:   LEN:00000000 TYPE:0001 OP:4=SHUTDOWN
TX:  FLAGS:00000002 BUF_ALLOC:00040000 FWD_CNT:0000005e

Guest SHUTDOWN_TX, I presume that having sent the only response it is going
to it then does shutdown(SHUT_WR).

TX: <=SRC:00000003.00000948 DST:00000002.00010002
TX:   LEN:00000000 TYPE:0001 OP:4=SHUTDOWN
TX:  FLAGS:00000003 BUF_ALLOC:00040000 FWD_CNT:0000005e

Guest shuts down both directions.

Perhaps the guest end is turning shutdown(foo) directly into a vsock
message without or-ing in the current state?
> > > I orignally had my backend close things down at #2, however this
meant
> > > > that when #3 arrived it was for a non-existent socket (or,
worse, an
> > > > active one if the ports got reused). I checked v5 of the
spec
> > > > proposal[0] which says:
> > > >     If these bits are set and there are no more virtqueue
buffers
> > > >     pending the socket is disconnected.
> > > >
> > > > but I'm not entirely sure if this behaviour contradicts
this or not
> > > > (the bits have both been set at #2, but not at the same
time).
> > > >
> > > > BTW, how does one tell if there are no more virtqueue
buffers pending
> > > > or not while processing the op?
> > >
> > > #2 is odd.  The shutdown bits are sticky so they cannot be
cleared once
> > > set.  I would have expected just #1 and #3.  The behavior you
observe
> > > look like a bug.
> > >
> > > The spec text does not convey the meaning of OP_SHUTDOWN well.
> > > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is
possible
> > > for this connection.  "there are no more virtqueue buffers
pending the
> > > socket" really means that this isn't an immediate close
from the
> > > perspective of the application.  If the application still has
unread rx
> > > buffers then the socket stays readable until the rx data has been
fully
> > > read.
> >
> > Yes but you also wrote:
> >       If these bits are set and there are no more virtqueue buffers
> >       pending the socket is disconnected.
> >
> > how does remote know that there are no buffers pending and so it's
safe
> > to reuse the same source/destination address now?  Maybe destination
> > should send RST at that point?
>
> You are right, the source/destination address could be reused while the
> remote still has the connection in their table.  Connection
> establishment would fail with a RST reply.
>
> I can think of two solutions:
>
> 1. Implementations must remove connections from their table as soon as
>    SHUTDOWN_TX|SHUTDOWN_RX is received.  This way the source/destination
>    address tuple can be reused immediately, i.e. new connections with
>    the same source/destination would be possible while an application is
>    still draining the receive buffers of an old connection.
>
> 2. Extend the connection lifecycle so that an A->B
>    SHUTDOWN_TX|SHUTDOWN_RX must be followed by a by a B->A RST to close
>    a connection.  This way the source/destination address is only in use
>    once at a time.
>
> Option #2 seems safer because there is no overlap in source/destination
> address usage.
>
I agree (I hadn't seen this when I replied to Michael similar suggestion).

Ian. (sorry again for the formatting)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160412/a2bb47cc/attachment-0001.html>

Stefan Hajnoczi

2016-Apr-13 13:38 UTC

head link

[RFC v5 0/5] Add virtio transport for AF_VSOCK

On Tue, Apr 12, 2016 at 05:37:54PM +0100, Ian Campbell
wrote:> Perhaps the guest end is turning shutdown(foo) directly into a vsock
> message without or-ing in the current state?
Yes, you are right:

  lock_sock(sk);
  sk->sk_shutdown |= mode;
  sk->sk_state_change(sk);
  release_sock(sk);

  if (sk->sk_type == SOCK_STREAM) {
      sock_reset_flag(sk, SOCK_DONE);
      vsock_send_shutdown(sk, mode);

Although sk_shutdown is ORed correctly, vsock_send_shutdown() is called with
just the shutdown() argument.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160413/23ec3144/attachment.sig>

Seemingly Similar Threads

Search for more apparently analagous threads

Linux Virtualization - Apr 2016 - [RFC v5 0/5] Add virtio transport for AF_VSOCK

[RFC v5 0/5] Add virtio transport for AF_VSOCK

[RFC v5 0/5] Add virtio transport for AF_VSOCK

[RFC v5 0/5] Add virtio transport for AF_VSOCK

Seemingly Similar Threads