Stefano Garzarella
2023-May-03 12:09 UTC
[PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support
On Sat, Apr 15, 2023 at 05:29:12PM +0000, Bobby Eshleman wrote:>On Fri, Apr 28, 2023 at 12:29:50PM +0200, Stefano Garzarella wrote: >> On Sat, Apr 15, 2023 at 10:30:55AM +0000, Bobby Eshleman wrote: >> > On Wed, Apr 19, 2023 at 11:30:53AM +0200, Stefano Garzarella wrote: >> > > On Fri, Apr 14, 2023 at 12:25:59AM +0000, Bobby Eshleman wrote: >> > > > Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock >> > > > it does not scale when many senders share a socket. >> > > > >> > > > Prior to this patch the socket lock is used to protect the local_addr, >> > > > remote_addr, transport, and buffer size variables. What follows are the >> > > > new protection schemes for the various protected fields that ensure a >> > > > race-free multi-sender sendmsg() path for vsock dgrams. >> > > > >> > > > - local_addr >> > > > local_addr changes as a result of binding a socket. The write path >> > > > for local_addr is bind() and various vsock_auto_bind() call sites. >> > > > After a socket has been bound via vsock_auto_bind() or bind(), subsequent >> > > > calls to bind()/vsock_auto_bind() do not write to local_addr again. bind() >> > > > rejects the user request and vsock_auto_bind() early exits. >> > > > Therefore, the local addr can not change while a parallel thread is >> > > > in sendmsg() and lock-free reads of local addr in sendmsg() are safe. >> > > > Change: only acquire lock for auto-binding as-needed in sendmsg(). >> > > > >> > > > - vsk->transport >> > > > Updated upon socket creation and it doesn't change again until the >> > > >> > > This is true only for dgram, right? >> > > >> > >> > Yes. >> > >> > > How do we decide which transport to assign for dgram? >> > > >> > >> > The transport is assigned in proto->create() [vsock_create()]. It is >> > assigned there *only* for dgrams, whereas for streams/seqpackets it is >> > assigned in connect(). vsock_create() sets transport to >> > 'transport_dgram' if sock->type == SOCK_DGRAM. >> > >> > vsock_sk_destruct() then eventually sets vsk->transport to NULL. >> > >> > Neither destruct nor create can occur in parallel with sendmsg(). >> > create() hasn't yet returned the sockfd for the user to call upon it >> > sendmsg(), and AFAICT destruct is only called after the final socket >> > reference is released, which only happens after the socket no longer >> > exists in the fd lookup table and so sendmsg() will fail before it ever >> > has the chance to race. >> >> This is okay if a socket can be assigned to a single transport, but with >> dgrams I'm not sure we can do that. >> >> Since it is not connected, a socket can send or receive packets from >> different transports, so maybe we can't assign it to a specific one, >> but do a lookup for every packets to understand which transport to use. >> > >Yes this is true, this lookup needs to be implemented... currently >sendmsg() doesn't do this at all. It grabs the remote_addr when passed >in from sendto(), but then just uses the same old transport from vsk. >You are right that sendto() should be a per-packet lookup, not a >vsk->transport read. Perhaps I should add that as another patch in this >series, and make it precede this one?Yep, I think so, we need to implement it before adding a new transport that can support dgram.> >For the send() / sendto(NULL) case where vsk->transport is being read, I >do believe this is still race-free, but... > >If we later support dynamic transports for datagram, such that >connect(VMADDR_LOCAL_CID) sets vsk->transport to transport_loopback, >connect(VMADDR_CID_HOST) sets vsk->transport to something like >transport_datagram_g2h, etc..., then vsk->transport will need to be >bundled into the RCU-protected pointer too, since it may change when >remote_addr changes.Yep, I think so. Although in vsock_dgram_connect we call lock_sock(), so maybe that could be enough to protect us. In general I think we should use vsk->transport if vsock_dgram_connect() is called, or we need to do per-packet lookup. Another think I would change, is the dgram_dequeue() callback. I would remove it, and move in the core the code in vmci_transport_dgram_dequeue() since it seems pretty generic. This should work well if every transport uses sk_receive_skb() to enqueue sk_buffs in the socket buffer. Thanks, Stefano