thr3ads.net - Virtualization - [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup [Dec 2022]

If this information is useful, please help other people find it:
Share via:
Stefano Garzarella
2022-Dec-20 08:33 UTC
[RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup

On Tue, Dec 20, 2022 at 07:14:27AM +0000, Arseniy Krasnov
wrote:>On 19.12.2022 18:41, Stefano Garzarella wrote:
>
>Hello!
>
>> Hi Arseniy,
>>
>> On Sat, Dec 17, 2022 at 8:42 PM Arseniy Krasnov <AVKrasnov at
sberdevices.ru> wrote:
>>>
>>> Hello,
>>>
>>> seems I found strange thing(may be a bug) where sender('tx'
later) and
>>> receiver('rx' later) could stuck forever. Potential fix is
in the first
>>> patch, second patch contains reproducer, based on vsock test suite.
>>> Reproducer is simple: tx just sends data to rx by 'write()
syscall, rx
>>> dequeues it using 'read()' syscall and uses
'poll()' for waiting. I run
>>> server in host and client in guest.
>>>
>>> rx side params:
>>> 1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
>>> 2) SO_RCVLOWAT is 128Kb.
>>>
>>> What happens in the reproducer step by step:
>>>
>>
>> I put the values of the variables involved to facilitate understanding:
>>
>> RX: buf_alloc = 256 KB; fwd_cnt = 0; last_fwd_cnt = 0;
>>     free_space = buf_alloc - (fwd_cnt - last_fwd_cnt) = 256 KB
>>
>> The credit update is sent if
>> free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [64 KB]
>>
>>> 1) tx tries to send 256Kb + 1 byte (in a single 'write()')
>>> 2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
>>> 3) tx waits for space in 'write()' to send last 1 byte
>>> 4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb,
POLLIN is set
>>> 5) rx reads 64Kb, credit update is not sent due to *
>>
>> RX: buf_alloc = 256 KB; fwd_cnt = 64 KB; last_fwd_cnt = 0;
>>     free_space = 192 KB
>>
>>> 6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb,
POLLIN is set
>>> 7) rx reads 64Kb, credit update is not sent due to *
>>
>> RX: buf_alloc = 256 KB; fwd_cnt = 128 KB; last_fwd_cnt = 0;
>>     free_space = 128 KB
>>
>>> 8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb,
POLLIN is set
>>> 9) rx reads 64Kb, credit update is not sent due to *
>>
>> Right, (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) is still false.
>>
>> RX: buf_alloc = 256 KB; fwd_cnt = 196 KB; last_fwd_cnt = 0;
>>     free_space = 64 KB
>>
>>> 10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx
waits in poll()
>>
>> I agree that the TX is stuck because we are not sending the credit
>> update, but also if RX sends the credit update at step 9, RX won't
be
>> woken up at step 10, right?
>
>Yes, RX will sleep, but TX will wake up and as we inform TX how much
>free space we have, now there are two cases for TX:
>1) send "small" rest of data(e.g. without blocking again), leave
'write()'
>   and continue execution. RX still waits in 'poll()'. Later TX will
>   send enough data to wake up RX.
>2) send "big" rest of data - if rest is too big to leave
'write()' and TX
>   will wait again for the free space - it will be able to send enough data
>   to wake up RX as we compared 'rx_bytes' with rcvlowat value in
RX.
Right, so I'd update the test to behave like this.
And I'd explain better the problem we are going to fix in the commit 
message.
>>
>>>
>>> * is optimization in 'virtio_transport_stream_do_dequeue()'
which
>>>   sends OP_CREDIT_UPDATE only when we have not too much space -
>>>   less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>>>
>>> Now tx side waits for space inside write() and rx waits in poll()
for
>>> 'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait
forever. I
>>> think, possible fix is to send credit update not only when we have
too
>>> small space, but also when number of bytes in receive queue is
smaller
>>> than SO_RCVLOWAT thus not enough to wake up sleeping reader.
I'm not
>>> sure about correctness of this idea, but anyway - I think that
problem
>>> above exists. What do You think?
>>
>> I'm not sure, I have to think more about it, but if RX reads less 
>> than
>> SO_RCVLOWAT, I expect it's normal to get to a case of stuck.
>>
>> In this case we are only unstucking TX, but even if it sends that
single
>> byte, RX is still stuck and not consuming it, so it was useless to wake
>> up TX if RX won't consume it anyway, right?
>
>1) I think it is not useless, because we inform(not just wake up) TX that
>there is free space at RX side - as i mentioned above.
>2) Anyway i think that this situation is a little bit strange: TX thinks
that
>there is no free space at RX and waits for it, but there is free space at
RX!
>At the same time, RX waits in poll() forever - it is ready to get new
portion
>of data to return POLLIN, but TX "thinks" exactly opposite thing -
RX is full
>of data. Of course, if there will be just stalls in TX data handling - it
will
>be ok - just performance degradation, but TX stucks forever.
We did it to avoid a lot of credit update messages.
Anyway I think here the main point is why RX is setting SO_RCVLOWAT to 
128 KB and then reads only half of it?

So I think if the users set SO_RCVLOWAT to a value and then RX reads 
less then it, is expected to get stuck.

Anyway, since the change will not impact the default behaviour 
(SO_RCVLOWAT = 1) we can merge this patch, but IMHO we need to explain 
the case better and improve the test.
>
>>
>> If RX woke up (e.g. SO_RCVLOWAT = 64KB) and read the remaining 64KB,
>> then it would still send the credit update even without this patch and
>> TX will send the 1 byte.
>
>But how RX will wake up in this case? E.g. it calls poll() without timeout,
>connection is established, RX ignores signal
RX will wake up because SO_RCVLOWAT is 64KB and there are 64 KB in the 
buffer. Then RX will read it and send the credit update to TX because
free_space is 0.

Thanks,
Stefano
Virtualization - Dec 2022 - [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup

[RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup