On 01/30/2017 08:06 PM, Greg Kurz wrote:>> Currently, under certain circumstances vhost_init_is_le does just a part >> of the initialization job, and depends on vhost_reset_is_le being called >> too. For this reason vhost_vq_init_access used to call vhost_reset_is_le >> when vq->private_data is NULL. This is not only counter intuitive, but >> also real a problem because it breaks vhost_net. The bug was introduced to >> vhost_net with commit 2751c9882b94 ("vhost: cross-endian support for >> legacy devices"). The symptom is corruption of the vq's used.idx field >> (virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost >> shutdown on a vq with pending descriptors. >> >> Let us make sure the outcome of vhost_init_is_le never depend on the state >> it is actually supposed to initialize, and fix virtio_net by removing the >> reset from vhost_vq_init_access. >> >> With the above, there is no reason for vhost_reset_is_le to do just half >> of the job. Let us make vhost_reset_is_le reinitialize is_le. >> >> Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> >> Reported-by: Michael A. Tebolt <miket at us.ibm.com> >> Reported-by: Dr. David Alan Gilbert <dgilbert at redhat.com> >> Fixes: commit 2751c9882b94 ("vhost: cross-endian support for legacy devices") >> --- > Reviewed-by: Greg Kurz <groug at kaod.org> >Thanks! We have some tests on s390x (that is BE) running, but I won't be able to test the change with cross endian and legacy. What do you think, should I/we RFT or are we fine without? Regards, Halil
On Tue, Jan 31, 2017 at 04:56:13PM +0100, Halil Pasic wrote:> > > On 01/30/2017 08:06 PM, Greg Kurz wrote: > >> Currently, under certain circumstances vhost_init_is_le does just a part > >> of the initialization job, and depends on vhost_reset_is_le being called > >> too. For this reason vhost_vq_init_access used to call vhost_reset_is_le > >> when vq->private_data is NULL. This is not only counter intuitive, but > >> also real a problem because it breaks vhost_net. The bug was introduced to > >> vhost_net with commit 2751c9882b94 ("vhost: cross-endian support for > >> legacy devices"). The symptom is corruption of the vq's used.idx field > >> (virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost > >> shutdown on a vq with pending descriptors. > >> > >> Let us make sure the outcome of vhost_init_is_le never depend on the state > >> it is actually supposed to initialize, and fix virtio_net by removing the > >> reset from vhost_vq_init_access. > >> > >> With the above, there is no reason for vhost_reset_is_le to do just half > >> of the job. Let us make vhost_reset_is_le reinitialize is_le. > >> > >> Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> > >> Reported-by: Michael A. Tebolt <miket at us.ibm.com> > >> Reported-by: Dr. David Alan Gilbert <dgilbert at redhat.com> > >> Fixes: commit 2751c9882b94 ("vhost: cross-endian support for legacy devices") > >> --- > > Reviewed-by: Greg Kurz <groug at kaod.org> > > > > Thanks! > > We have some tests on s390x (that is BE) running, but I won't be able to > test the change with cross endian and legacy. > > What do you think, should I/we RFT or are we fine without? > > Regards, > HalilMore testing can't hurt. I can merge this meanwhile. -- MST
On Tue, 31 Jan 2017 16:56:13 +0100 Halil Pasic <pasic at linux.vnet.ibm.com> wrote:> On 01/30/2017 08:06 PM, Greg Kurz wrote: > >> Currently, under certain circumstances vhost_init_is_le does just a part > >> of the initialization job, and depends on vhost_reset_is_le being called > >> too. For this reason vhost_vq_init_access used to call vhost_reset_is_le > >> when vq->private_data is NULL. This is not only counter intuitive, but > >> also real a problem because it breaks vhost_net. The bug was introduced to > >> vhost_net with commit 2751c9882b94 ("vhost: cross-endian support for > >> legacy devices"). The symptom is corruption of the vq's used.idx field > >> (virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost > >> shutdown on a vq with pending descriptors. > >> > >> Let us make sure the outcome of vhost_init_is_le never depend on the state > >> it is actually supposed to initialize, and fix virtio_net by removing the > >> reset from vhost_vq_init_access. > >> > >> With the above, there is no reason for vhost_reset_is_le to do just half > >> of the job. Let us make vhost_reset_is_le reinitialize is_le. > >> > >> Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> > >> Reported-by: Michael A. Tebolt <miket at us.ibm.com> > >> Reported-by: Dr. David Alan Gilbert <dgilbert at redhat.com> > >> Fixes: commit 2751c9882b94 ("vhost: cross-endian support for legacy devices") > >> --- > > Reviewed-by: Greg Kurz <groug at kaod.org> > > > > Thanks! > > We have some tests on s390x (that is BE) running, but I won't be able to > test the change with cross endian and legacy. >I'll try to find some time to run such tests on ppc. Cheers. -- Greg> What do you think, should I/we RFT or are we fine without? > > Regards, > Halil >
On 01/31/2017 07:28 PM, Michael S. Tsirkin wrote:> On Tue, Jan 31, 2017 at 04:56:13PM +0100, Halil Pasic wrote: >> >> >> On 01/30/2017 08:06 PM, Greg Kurz wrote: >>>> Currently, under certain circumstances vhost_init_is_le does just a part >>>> of the initialization job, and depends on vhost_reset_is_le being called >>>> too. For this reason vhost_vq_init_access used to call vhost_reset_is_le >>>> when vq->private_data is NULL. This is not only counter intuitive, but >>>> also real a problem because it breaks vhost_net. The bug was introduced to >>>> vhost_net with commit 2751c9882b94 ("vhost: cross-endian support for >>>> legacy devices"). The symptom is corruption of the vq's used.idx field >>>> (virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost >>>> shutdown on a vq with pending descriptors. >>>> >>>> Let us make sure the outcome of vhost_init_is_le never depend on the state >>>> it is actually supposed to initialize, and fix virtio_net by removing the >>>> reset from vhost_vq_init_access. >>>> >>>> With the above, there is no reason for vhost_reset_is_le to do just half >>>> of the job. Let us make vhost_reset_is_le reinitialize is_le. >>>> >>>> Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> >>>> Reported-by: Michael A. Tebolt <miket at us.ibm.com> >>>> Reported-by: Dr. David Alan Gilbert <dgilbert at redhat.com> >>>> Fixes: commit 2751c9882b94 ("vhost: cross-endian support for legacy devices") >>>> --- >>> Reviewed-by: Greg Kurz <groug at kaod.org> >>> >> >> Thanks! >> >> We have some tests on s390x (that is BE) running, but I won't be able to >> test the change with cross endian and legacy. >> >> What do you think, should I/we RFT or are we fine without? >> >> Regards, >> Halil > > More testing can't hurt. I can merge this meanwhile. >I received a word from our test team. No problems discovered with a mix of legacy and virtio 1 guests on s390x (was reliably reproducible without this patch with the same setup). Could you please add: Tested-by: Michael A. Tebolt <miket at us.ibm.com> Regards, Halil