Michael S. Tsirkin
2021-Feb-07 16:20 UTC
[RFC PATCH v4 00/17] virtio/vsock: introduce SOCK_SEQPACKET support
On Sun, Feb 07, 2021 at 06:12:56PM +0300, Arseny Krasnov wrote:> This patchset impelements support of SOCK_SEQPACKET for virtio > transport. > As SOCK_SEQPACKET guarantees to save record boundaries, so to > do it, two new packet operations were added: first for start of record > and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, > both operations carries metadata - to maintain boundaries and payload > integrity. Metadata is introduced by adding special header with two > fields - message count and message length: > > struct virtio_vsock_seq_hdr { > __le32 msg_cnt; > __le32 msg_len; > } __attribute__((packed)); > > This header is transmitted as payload of SEQ_BEGIN and SEQ_END > packets(buffer of second virtio descriptor in chain) in the same way as > data transmitted in RW packets. Payload was chosen as buffer for this > header to avoid touching first virtio buffer which carries header of > packet, because someone could check that size of this buffer is equal > to size of packet header. To send record, packet with start marker is > sent first(it's header contains length of record and counter), then > counter is incremented and all data is sent as usual 'RW' packets and > finally SEQ_END is sent(it also carries counter of message, which is > counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is > incremented again. On receiver's side, length of record is known from > packet with start record marker. To check that no packets were dropped > by transport, counters of two sequential SEQ_BEGIN and SEQ_END are > checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by > 1) and length of data between two markers is compared to length in > SEQ_BEGIN header. > Now as packets of one socket are not reordered neither on > vsock nor on vhost transport layers, such markers allows to restore > original record on receiver's side. If user's buffer is smaller that > record length, when all out of size data is dropped. > Maximum length of datagram is not limited as in stream socket, > because same credit logic is used. Difference with stream socket is > that user is not woken up until whole record is received or error > occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. > Tests also implemented. > > Arseny Krasnov (17): > af_vsock: update functions for connectible socket > af_vsock: separate wait data loop > af_vsock: separate receive data loop > af_vsock: implement SEQPACKET receive loop > af_vsock: separate wait space loop > af_vsock: implement send logic for SEQPACKET > af_vsock: rest of SEQPACKET support > af_vsock: update comments for stream sockets > virtio/vsock: dequeue callback for SOCK_SEQPACKET > virtio/vsock: fetch length for SEQPACKET record > virtio/vsock: add SEQPACKET receive logic > virtio/vsock: rest of SOCK_SEQPACKET support > virtio/vsock: setup SEQPACKET ops for transport > vhost/vsock: setup SEQPACKET ops for transport > vsock_test: add SOCK_SEQPACKET tests > loopback/vsock: setup SEQPACKET ops for transport > virtio/vsock: simplify credit update function API > > drivers/vhost/vsock.c | 8 +- > include/linux/virtio_vsock.h | 15 + > include/net/af_vsock.h | 9 + > include/uapi/linux/virtio_vsock.h | 16 + > net/vmw_vsock/af_vsock.c | 588 +++++++++++++++------- > net/vmw_vsock/virtio_transport.c | 5 + > net/vmw_vsock/virtio_transport_common.c | 316 ++++++++++-- > net/vmw_vsock/vsock_loopback.c | 5 + > tools/testing/vsock/util.c | 32 +- > tools/testing/vsock/util.h | 3 + > tools/testing/vsock/vsock_test.c | 126 +++++ > 11 files changed, 895 insertions(+), 228 deletions(-) > > TODO: > - What to do, when server doesn't support SOCK_SEQPACKET. In current > implementation RST is replied in the same way when listening port > is not found. I think that current RST is enough,because case when > server doesn't support SEQ_PACKET is same when listener missed(e.g. > no listener in both cases).- virtio spec patch> v3 -> v4: > - callbacks for loopback transport > - SEQPACKET specific metadata moved from packet header to payload > and called 'virtio_vsock_seq_hdr' > - record integrity check: > 1) SEQ_END operation was added, which marks end of record. > 2) Both SEQ_BEGIN and SEQ_END carries counter which is incremented > on every marker send. > - af_vsock.c: socket operations for STREAM and SEQPACKET call same > functions instead of having own "gates" differs only by names: > 'vsock_seqpacket/stream_getsockopt()' now replaced with > 'vsock_connectible_getsockopt()'. > - af_vsock.c: 'seqpacket_dequeue' callback returns error and flag that > record ready. There is no need to return number of copied bytes, > because case when record received successfully is checked at virtio > transport layer, when SEQ_END is processed. Also user doesn't need > number of copied bytes, because 'recv()' from SEQPACKET could return > error, length of users's buffer or length of whole record(both are > known in af_vsock.c). > - af_vsock.c: both wait loops in af_vsock.c(for data and space) moved > to separate functions because now both called from several places. > - af_vsock.c: 'vsock_assign_transport()' checks that 'new_transport' > pointer is not NULL and returns 'ESOCKTNOSUPPORT' instead of 'ENODEV' > if failed to use transport. > - tools/testing/vsock/vsock_test.c: rename tests > > v2 -> v3: > - patches reorganized: split for prepare and implementation patches > - local variables are declared in "Reverse Christmas tree" manner > - virtio_transport_common.c: valid leXX_to_cpu() for vsock header > fields access > - af_vsock.c: 'vsock_connectible_*sockopt()' added as shared code > between stream and seqpacket sockets. > - af_vsock.c: loops in '__vsock_*_recvmsg()' refactored. > - af_vsock.c: 'vsock_wait_data()' refactored. > > v1 -> v2: > - patches reordered: af_vsock.c related changes now before virtio vsock > - patches reorganized: more small patches, where +/- are not mixed > - tests for SOCK_SEQPACKET added > - all commit messages updated > - af_vsock.c: 'vsock_pre_recv_check()' inlined to > 'vsock_connectible_recvmsg()' > - af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport > was not found > - virtio_transport_common.c: transport callback for seqpacket dequeue > - virtio_transport_common.c: simplified > 'virtio_transport_recv_connected()' > - virtio_transport_common.c: send reset on socket and packet type > mismatch. > > -- > 2.25.1