Stefano Garzarella
2022-Jul-20 09:30 UTC
[RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM
On Wed, Jul 20, 2022 at 06:07:47AM +0000, Arseniy Krasnov wrote:>On 19.07.2022 15:58, Stefano Garzarella wrote: >> On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote: >>> Hello, >>> >>> during my experiments with zerocopy receive, i found, that in some >>> cases, poll() implementation violates POSIX: when socket has non- >>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and >>> POLLRDNORM bits in 'revents' even number of bytes available to read >>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees >>> POLLIN flag and then tries to read data(for example using? 'read()' >>> call), but read call will be blocked, because? SO_RCVLOWAT logic is >>> supported in dequeue loop in af_vsock.c. But the same time,? POSIX >>> requires that: >>> >>> "POLLIN???? Data other than high-priority data may be read without >>> ?????????? blocking. >>> POLLRDNORM Normal data may be read without blocking." >>> >>> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293. >>> >>> So, we have, that poll() syscall returns POLLIN, but read call will >>> be blocked. >>> >>> Also in man page socket(7) i found that: >>> >>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a >>> socket as readable only if at least SO_RCVLOWAT bytes are available." >>> >>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it >>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with >>> this case for TCP socket, it works as POSIX required. >> >> I tried to look at the code and it seems that only TCP complies with it or am I wrong? >Yes, i checked AF_UNIX, it also don't care about that. It calls skb_queue_empty() that of >course ignores SO_RCVLOWAT. >> >>> >>> I've added some fixes to af_vsock.c and virtio_transport_common.c, >>> test is also implemented. >>> >>> What do You think guys? >> >> Nice, thanks for fixing this and for the test! >> >> I left some comments, but I think the series is fine if we will support it in all transports. >Ack >> >> I'd just like to understand if it's just TCP complying with it or I'm missing some check included in the socket layer that we could reuse. >Seems sock_poll() which is socket layer entry point for poll() doesn't contain any such checks >> >> @David, @Jakub, @Paolo, any advice? >> >> Thanks, >> Stefano >> > >PS: moreover, i found one more interesting thing with TCP and poll: TCP receive logic wakes up poll waiter >only when number of available bytes > SO_RCVLOWAT. E.g. it prevents "spurious" wake ups, when poll will be >woken up because new data arrived, but POLLIN to allow user dequeue this data won't be set(as amount of data >is too small). >See tcp_data_ready() in net/ipv4/tcp_input.cDo you mean that we should call sk->sk_data_ready(sk) checking SO_RCVLOWAT? It seems fine, maybe we can add vsock_data_ready() in af_vsock.c that transports should call instead of calling sk->sk_data_ready(sk) directly. Then we can something similar to tcp_data_ready(). Thanks, Stefano