On Mon, Apr 11, 2016 at 03:54:08PM +0300, Michael S. Tsirkin wrote:> On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote: > > On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote: > > > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote: > > > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6). > > > > > > > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but > > > > first I want to share the latest version of the code.??Several people are > > > > playing with vsock now so sharing the latest code should avoid duplicate work. > > > > > > Thanks for this, I've been using it in my project and it mostly seems > > > fine. > > > > > > One wrinkle I came across, which I'm not sure if it is by design or a > > > problem is that I can see this sequence coming from the guest (with > > > other activity in between): > > > > > > ? ? 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX > > > ????2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX > > > ????3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RXHow did you trigger this sequence? I'd like to reproduce it.> > > I orignally had my backend close things down at #2, however this meant > > > that when #3 arrived it was for a non-existent socket (or, worse, an > > > active one if the ports got reused). I checked v5 of the spec > > > proposal[0] which says: > > > If these bits are set and there are no more virtqueue buffers > > > pending the socket is disconnected. > > > > > > but I'm not entirely sure if this behaviour contradicts this or not > > > (the bits have both been set at #2, but not at the same time). > > > > > > BTW, how does one tell if there are no more virtqueue buffers pending > > > or not while processing the op? > > > > #2 is odd. The shutdown bits are sticky so they cannot be cleared once > > set. I would have expected just #1 and #3. The behavior you observe > > look like a bug. > > > > The spec text does not convey the meaning of OP_SHUTDOWN well. > > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible > > for this connection. "there are no more virtqueue buffers pending the > > socket" really means that this isn't an immediate close from the > > perspective of the application. If the application still has unread rx > > buffers then the socket stays readable until the rx data has been fully > > read. > > Yes but you also wrote: > If these bits are set and there are no more virtqueue buffers > pending the socket is disconnected. > > how does remote know that there are no buffers pending and so it's safe > to reuse the same source/destination address now? Maybe destination > should send RST at that point?You are right, the source/destination address could be reused while the remote still has the connection in their table. Connection establishment would fail with a RST reply. I can think of two solutions: 1. Implementations must remove connections from their table as soon as SHUTDOWN_TX|SHUTDOWN_RX is received. This way the source/destination address tuple can be reused immediately, i.e. new connections with the same source/destination would be possible while an application is still draining the receive buffers of an old connection. 2. Extend the connection lifecycle so that an A->B SHUTDOWN_TX|SHUTDOWN_RX must be followed by a by a B->A RST to close a connection. This way the source/destination address is only in use once at a time. Option #2 seems safer because there is no overlap in source/destination address usage. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160412/70774bc7/attachment.sig>
For some reason your mails in this thread only appear in the gmail web UI and not on the IMAP version of my mailbox (my own and Michael's mails are fine). So I'm replying via the web interface, sorry for the inevitable formatting mess :-/ I've CCd another mailbox in the hopes of getting your mails in that IMAP folder instead/aswell so I can avoid this next time. On 12 April 2016 at 14:59, Stefan Hajnoczi <stefanha at redhat.com> wrote:> > > > > One wrinkle I came across, which I'm not sure if it is by design or a > > > > problem is that I can see this sequence coming from the guest (with > > > > other activity in between): > > > > > > > > 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX > > > > 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX > > > > 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX > > How did you trigger this sequence? I'd like to reproduce it. >Nothing magic. I've written some logging into my backend and captured the result for a simple backend initiated connection. In the log "TX" and "RX" indicate the thread doing the processing (with "TX" being the one which processes the guest's TX ring, i.e. data coming from the guest to the host). "<=" indicates a buffer going from guest to host and "=>" is from host to guest. NB that guest to host replies are queued synchronously by the TX thread onto the RX ring which is why the somewhat odd looking "TX: =>" combination can occur. A host initiated connection also happens from the TX thread in the same way. The trace is of a simple request response (which both fit in one buffer in each direction), the lines without an "?X:" prefix are my annotations/guesses as to what is going on: TX: =>SRC:00000002.00010002 DST:00000003.00000948 TX: LEN:00000000 TYPE:0001 OP:1=REQUEST TX: FLAGS:00000000 BUF_ALLOC:00008000 FWD_CNT:00000000 TX: <=SRC:00000003.00000948 DST:00000002.00010002 TX: LEN:00000000 TYPE:0001 OP:2=RESPONSE TX: FLAGS:00000000 BUF_ALLOC:00040000 FWD_CNT:00000000 REQUEST + RESPONSE == Channel open successfully RX: =>SRC:00000002.00010002 DST:00000003.00000948 RX: LEN:0000005e TYPE:0001 OP:5=RW RX: FLAGS:00000000 BUF_ALLOC:00008000 FWD_CNT:00000000 Host sends a request to the guest TX: <=SRC:00000003.00000948 DST:00000002.00010002 TX: LEN:00000000 TYPE:0001 OP:6=CREDIT_UPDATE TX: FLAGS:00000000 BUF_ALLOC:00040000 FWD_CNT:0000005e Guest replies with a credit update TX: <=SRC:00000003.00000948 DST:00000002.00010002 TX: LEN:00000091 TYPE:0001 OP:5=RW TX: FLAGS:00000000 BUF_ALLOC:00040000 FWD_CNT:0000005e Guest replies with the answer to the request RX: =>SRC:00000002.00010002 DST:00000003.00000948 RX: LEN:00000000 TYPE:0001 OP:4=SHUTDOWN RX: FLAGS:00000002 BUF_ALLOC:00008000 FWD_CNT:00000091 Host has sent its only request, so host app must have done shutdown(SHUT_WR) I suppose and host therefore sends SHUTDOWN_TX. TX: <=SRC:00000003.00000948 DST:00000002.00010002 TX: LEN:00000000 TYPE:0001 OP:4=SHUTDOWN TX: FLAGS:00000001 BUF_ALLOC:00040000 FWD_CNT:0000005e Guest SHUTDOWN_RX. I'm not sure if this is a direct kernel response to the SHUTDOWN_TX or if the application inside the guest saw an EOF when reading the socket and did the corresponding shutdown(SHUT_RD). TX: <=SRC:00000003.00000948 DST:00000002.00010002 TX: LEN:00000000 TYPE:0001 OP:4=SHUTDOWN TX: FLAGS:00000002 BUF_ALLOC:00040000 FWD_CNT:0000005e Guest SHUTDOWN_TX, I presume that having sent the only response it is going to it then does shutdown(SHUT_WR). TX: <=SRC:00000003.00000948 DST:00000002.00010002 TX: LEN:00000000 TYPE:0001 OP:4=SHUTDOWN TX: FLAGS:00000003 BUF_ALLOC:00040000 FWD_CNT:0000005e Guest shuts down both directions. Perhaps the guest end is turning shutdown(foo) directly into a vsock message without or-ing in the current state?> > > I orignally had my backend close things down at #2, however this meant > > > > that when #3 arrived it was for a non-existent socket (or, worse, an > > > > active one if the ports got reused). I checked v5 of the spec > > > > proposal[0] which says: > > > > If these bits are set and there are no more virtqueue buffers > > > > pending the socket is disconnected. > > > > > > > > but I'm not entirely sure if this behaviour contradicts this or not > > > > (the bits have both been set at #2, but not at the same time). > > > > > > > > BTW, how does one tell if there are no more virtqueue buffers pending > > > > or not while processing the op? > > > > > > #2 is odd. The shutdown bits are sticky so they cannot be cleared once > > > set. I would have expected just #1 and #3. The behavior you observe > > > look like a bug. > > > > > > The spec text does not convey the meaning of OP_SHUTDOWN well. > > > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible > > > for this connection. "there are no more virtqueue buffers pending the > > > socket" really means that this isn't an immediate close from the > > > perspective of the application. If the application still has unread rx > > > buffers then the socket stays readable until the rx data has been fully > > > read. > > > > Yes but you also wrote: > > If these bits are set and there are no more virtqueue buffers > > pending the socket is disconnected. > > > > how does remote know that there are no buffers pending and so it's safe > > to reuse the same source/destination address now? Maybe destination > > should send RST at that point? > > You are right, the source/destination address could be reused while the > remote still has the connection in their table. Connection > establishment would fail with a RST reply. > > I can think of two solutions: > > 1. Implementations must remove connections from their table as soon as > SHUTDOWN_TX|SHUTDOWN_RX is received. This way the source/destination > address tuple can be reused immediately, i.e. new connections with > the same source/destination would be possible while an application is > still draining the receive buffers of an old connection. > > 2. Extend the connection lifecycle so that an A->B > SHUTDOWN_TX|SHUTDOWN_RX must be followed by a by a B->A RST to close > a connection. This way the source/destination address is only in use > once at a time. > > Option #2 seems safer because there is no overlap in source/destination > address usage. >I agree (I hadn't seen this when I replied to Michael similar suggestion). Ian. (sorry again for the formatting) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160412/a2bb47cc/attachment-0001.html>
On Tue, Apr 12, 2016 at 05:37:54PM +0100, Ian Campbell wrote:> Perhaps the guest end is turning shutdown(foo) directly into a vsock > message without or-ing in the current state?Yes, you are right: lock_sock(sk); sk->sk_shutdown |= mode; sk->sk_state_change(sk); release_sock(sk); if (sk->sk_type == SOCK_STREAM) { sock_reset_flag(sk, SOCK_DONE); vsock_send_shutdown(sk, mode); Although sk_shutdown is ORed correctly, vsock_send_shutdown() is called with just the shutdown() argument. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160413/23ec3144/attachment.sig>