On Fri, Nov 19, 2021 at 11:10 PM Halil Pasic <pasic at linux.ibm.com> wrote:> > On Wed, 27 Oct 2021 10:21:04 +0800 > Jason Wang <jasowang at redhat.com> wrote: > > > This patch validate the used buffer length provided by the device > > before trying to use it. This is done by record the in buffer length > > in a new field in desc_state structure during virtqueue_add(), then we > > can fail the virtqueue_get_buf() when we find the device is trying to > > give us a used buffer length which is greater than the in buffer > > length. > > > > Since some drivers have already done the validation by themselves, > > this patch tries to makes the core validation optional. For the driver > > that doesn't want the validation, it can set the > > suppress_used_validation to be true (which could be overridden by > > force_used_validation module parameter). To be more efficient, a > > dedicate array is used for storing the validate used length, this > > helps to eliminate the cache stress if validation is done by the > > driver. > > > > Signed-off-by: Jason Wang <jasowang at redhat.com> > > Hi Jason! > > Our CI has detected, that virtio-vsock became unusable with this > patch on s390x. I didn't test on x86 yet. The guest kernel says > something like: > vmw_vsock_virtio_transport virtio1: tx: used len 44 is larger than in buflen 0 > > Did you, or anybody else, see something like this on platforms other that > s390x?Adding Stefan and Stefano. I think it should be a common issue, looking at vhost_vsock_handle_tx_kick(), it did: len += sizeof(pkt->hdr); vhost_add_used(vq, head, len); which looks like a violation of the spec since it's TX.> > I had a quick look at this code, and I speculate that it probably > uncovers a pre-existig bug, rather than introducing a new one.I agree.> > If somebody is already working on this please reach out to me.AFAIK, no. I think the plan is to fix both the device and drive side (but I'm not sure we need a new feature for this if we stick to the validation). Thanks> > Regards, > Halil >
On Mon, 22 Nov 2021 11:51:09 +0800 Jason Wang <jasowang at redhat.com> wrote:> On Fri, Nov 19, 2021 at 11:10 PM Halil Pasic <pasic at linux.ibm.com> wrote: > > > > On Wed, 27 Oct 2021 10:21:04 +0800 > > Jason Wang <jasowang at redhat.com> wrote: > > > > > This patch validate the used buffer length provided by the device > > > before trying to use it. This is done by record the in buffer length > > > in a new field in desc_state structure during virtqueue_add(), then we > > > can fail the virtqueue_get_buf() when we find the device is trying to > > > give us a used buffer length which is greater than the in buffer > > > length. > > > > > > Since some drivers have already done the validation by themselves, > > > this patch tries to makes the core validation optional. For the driver > > > that doesn't want the validation, it can set the > > > suppress_used_validation to be true (which could be overridden by > > > force_used_validation module parameter). To be more efficient, a > > > dedicate array is used for storing the validate used length, this > > > helps to eliminate the cache stress if validation is done by the > > > driver. > > > > > > Signed-off-by: Jason Wang <jasowang at redhat.com> > > > > Hi Jason! > > > > Our CI has detected, that virtio-vsock became unusable with this > > patch on s390x. I didn't test on x86 yet. The guest kernel says > > something like: > > vmw_vsock_virtio_transport virtio1: tx: used len 44 is larger than in buflen 0 > > > > Did you, or anybody else, see something like this on platforms other that > > s390x? > > Adding Stefan and Stefano. > > I think it should be a common issue, looking at > vhost_vsock_handle_tx_kick(), it did: > > len += sizeof(pkt->hdr); > vhost_add_used(vq, head, len); > > which looks like a violation of the spec since it's TX.I'm not sure the lines above look like a violation of the spec. If you examine vhost_vsock_alloc_pkt() I believe that you will agree that: len == pkt->len == pkt->hdr.len which makes sense since according to the spec both tx and rx messages are hdr+payload. And I believe hdr.len is the size of the payload, although that does not seem to be properly documented by the spec. On the other hand tx messages are stated to be device read-only (in the spec) so if the device writes stuff, that is certainly wrong. If that is what happens. Looking at virtqueue_get_buf_ctx_split() I'm not sure that is what happens. My hypothesis is that we just a last descriptor is an 'in' type descriptor (i.e. a device writable one). For tx that assumption would be wrong. I will have another look at this today and send a fix patch if my suspicion is confirmed.> > > > > I had a quick look at this code, and I speculate that it probably > > uncovers a pre-existig bug, rather than introducing a new one. > > I agree. >:) I'm not so sure any more myself.> > > > If somebody is already working on this please reach out to me. > > AFAIK, no.Thanks for the info! Then I will dig a little deeper. I asked in order to avoid doing the debugging and fixing just to see that somebody was faster :D> I think the plan is to fix both the device and drive side > (but I'm not sure we need a new feature for this if we stick to the > validation). > > Thanks >Thank you! Regards, Halil
Stefano Garzarella
2021-Nov-22 07:42 UTC
[PATCH V5 1/4] virtio_ring: validate used buffer length
On Mon, Nov 22, 2021 at 11:51:09AM +0800, Jason Wang wrote:>On Fri, Nov 19, 2021 at 11:10 PM Halil Pasic <pasic at linux.ibm.com> wrote: >> >> On Wed, 27 Oct 2021 10:21:04 +0800 >> Jason Wang <jasowang at redhat.com> wrote: >> >> > This patch validate the used buffer length provided by the device >> > before trying to use it. This is done by record the in buffer length >> > in a new field in desc_state structure during virtqueue_add(), then we >> > can fail the virtqueue_get_buf() when we find the device is trying to >> > give us a used buffer length which is greater than the in buffer >> > length. >> > >> > Since some drivers have already done the validation by themselves, >> > this patch tries to makes the core validation optional. For the driver >> > that doesn't want the validation, it can set the >> > suppress_used_validation to be true (which could be overridden by >> > force_used_validation module parameter). To be more efficient, a >> > dedicate array is used for storing the validate used length, this >> > helps to eliminate the cache stress if validation is done by the >> > driver. >> > >> > Signed-off-by: Jason Wang <jasowang at redhat.com> >> >> Hi Jason! >> >> Our CI has detected, that virtio-vsock became unusable with this >> patch on s390x. I didn't test on x86 yet. The guest kernel says >> something like: >> vmw_vsock_virtio_transport virtio1: tx: used len 44 is larger than in buflen 0 >> >> Did you, or anybody else, see something like this on platforms other that >> s390x? > >Adding Stefan and Stefano. > >I think it should be a common issue, looking atYep, I confirm the same behaviour on x86_64. On Friday evening I had the same failure while testing latest QEMU and Linux kernel.>vhost_vsock_handle_tx_kick(), it did: > >len += sizeof(pkt->hdr); >vhost_add_used(vq, head, len); > >which looks like a violation of the spec since it's TX. > >> >> I had a quick look at this code, and I speculate that it probably >> uncovers a pre-existig bug, rather than introducing a new one. > >I agree. > >> >> If somebody is already working on this please reach out to me. >My plan was to debug and test it today, so let me know if you need some help.>AFAIK, no. I think the plan is to fix both the device and drive side >(but I'm not sure we need a new feature for this if we stick to the >validation). >Yes, maybe we need a new feature, since I believe there has been this problem since the beginning. Thanks, Stefano