Paolo Bonzini
2023-Jul-12 08:06 UTC
[PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
On 7/11/23 22:21, Mike Christie wrote:> What was the issue you are seeing? > > Was it something like you get the UA. We retry then on one of the > retries the sense is not setup correctly, so the scsi error handler > runs? That fails and the device goes offline? > > If you turn on scsi debugging you would see: > > > [ 335.445922] sd 0:0:0:0: [sda] tag#15 Add. Sense: Reported luns data has changed > [ 335.445922] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 335.445925] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 335.445929] sd 0:0:0:0: [sda] tag#17 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s > [ 335.445932] sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 00 db 4f c0 00 00 20 00 > [ 335.445934] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 335.445936] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 335.445938] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 335.445940] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 335.445942] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 335.445945] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 335.451447] scsi host0: scsi_eh_0: waking up 0/2/2 > [ 335.451453] scsi host0: Total of 2 commands on 1 devices require eh work > [ 335.451457] sd 0:0:0:0: [sda] tag#16 scsi_eh_0: requesting senseDoes this log come from internal discussions within Oracle?> I don't know the qemu scsi code well, but I scanned the code for my co-worker > and my guess was commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2 had a race in it. > > How is locking done? when it is a bus level UA but there are multiple devices > on the bus?No locking should be necessary, the code is single threaded. However, what can happen is that two consecutive calls to virtio_scsi_handle_cmd_req_prepare use the unit attention ReqOps, and then the second virtio_scsi_handle_cmd_req_submit finds no unit attention (see the loop in virtio_scsi_handle_cmd_vq). That can definitely explain the log above. Paolo
Stefano Garzarella
2023-Jul-12 10:14 UTC
[PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
On Wed, Jul 12, 2023 at 10:06:56AM +0200, Paolo Bonzini wrote:>On 7/11/23 22:21, Mike Christie wrote: >>What was the issue you are seeing? >> >>Was it something like you get the UA. We retry then on one of the >>retries the sense is not setup correctly, so the scsi error handler >>runs? That fails and the device goes offline? >> >>If you turn on scsi debugging you would see: >> >> >>[ 335.445922] sd 0:0:0:0: [sda] tag#15 Add. Sense: Reported luns data has changed >>[ 335.445922] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>[ 335.445925] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>[ 335.445929] sd 0:0:0:0: [sda] tag#17 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s >>[ 335.445932] sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 00 db 4f c0 00 00 20 00 >>[ 335.445934] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>[ 335.445936] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>[ 335.445938] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>[ 335.445940] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>[ 335.445942] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>[ 335.445945] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>[ 335.451447] scsi host0: scsi_eh_0: waking up 0/2/2 >>[ 335.451453] scsi host0: Total of 2 commands on 1 devices require eh work >>[ 335.451457] sd 0:0:0:0: [sda] tag#16 scsi_eh_0: requesting sense > >Does this log come from internal discussions within Oracle? > >>I don't know the qemu scsi code well, but I scanned the code for my co-worker >>and my guess was commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2 had a race in it. >> >>How is locking done? when it is a bus level UA but there are multiple devices >>on the bus? > >No locking should be necessary, the code is single threaded. However, >what can happen is that two consecutive calls to >virtio_scsi_handle_cmd_req_prepare use the unit attention ReqOps, and >then the second virtio_scsi_handle_cmd_req_submit finds no unit >attention (see the loop in virtio_scsi_handle_cmd_vq). That can >definitely explain the log above.Yes, this seems to be the case! Thank you both for the help! Following Paolo's advice, I'm preparing a series for QEMU to solve the problem! Stefano