On Mon, 10 Apr 2017 10:49:59 -0700 "Mahlon E. Smith" <mahlon at
martini.nu> wrote
> Hi. Got some new machines running 11.0-RELEASE-p8 with a small pile of SSD
> in them.
>
> Getting some strange CAM timeouts at boot, that dramatically delay
> startup times. These errors don't happen at all after boot, everything
> seems just fine.
>
> Any clues / tuning suggestions are appreciated.
>
>
> Relevant stuff from dmesg:
>
> mpr0: <Avago Technologies (LSI) SAS3008> port 0x3000-0x30ff mem
> 0x92000000-0x9200ffff,0x91f00000-0x91ffffff at device 0.0 numa-domain 0 on
> pci3 mpr0: IOCFacts :
> MsgVersion: 0x205
> HeaderVersion: 0x2a00
> IOCNumber: 0
> IOCExceptions: 0x0
> MaxChainDepth: 128
> NumberOfPorts: 1
> RequestCredit: 9680
> ProductID: 0x2221
> IOCRequestFrameSize: 32
> MaxInitiators: 0
> MaxTargets: 544
> MaxSasExpanders: 125
> MaxEnclosures: 126
> HighPriorityCredit: 124
> MaxReplyDescriptorPostQueueDepth: 65504
> ReplyFrameSize: 32
> MaxVolumes: 0
> MaxDevHandle: 677
> MaxPersistentEntries: 240
> mpr0: Firmware: 13.00.00.00, Driver: 13.01.00.00-fbsd
> mpr0: IOCCapabilities:
>
7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,Ho
> stDisc>
>
> da0 at mpr0 bus 0 scbus0 target 19 lun 0
> da0: <ATA SAMSUNG MZ7KM960 GB32> Fixed Direct Access SPC-4 SCSI
device
> da0: Serial Number S2HTNX0H701448
> da0: 1200.000MB/s transfers
> da0: Command Queueing enabled
> da0: 915715MB (1875385008 512 byte sectors)
> da0: quirks=0x8<4K>
>
>
> mpr0: Sending reset from mprsas_send_abort for target ID 19
> mpr0: Unfreezing devq for target ID 19
> (da0:mpr0:0:19:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da0:mpr0:0:19:0): CAM status: Command timeout
> (da0:mpr0:0:19:0): Retrying command
> (da0:mpr0:0:19:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da0:mpr0:0:19:0): CAM status: SCSI Status Error
> (da0:mpr0:0:19:0): SCSI status: Check Condition
> (da0:mpr0:0:19:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset,
> or bus device reset occurred) (da0:mpr0:0:19:0): Retrying command (per
> sense data) (noperiph:mpr0:0:4294967295:0): SMID 2 Aborting
> command 0xfffffe00026066c0 mpr0: Sending reset from mprsas_send_abort
for
> target ID 20 mpr0: Unfreezing devq for target ID 20
> (da1:mpr0:0:20:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da1:mpr0:0:20:0): CAM status: Command timeout
> (da1:mpr0:0:20:0): Retrying command
> (da1:mpr0:0:20:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da1:mpr0:0:20:0): CAM status: SCSI Status Error
> (da1:mpr0:0:20:0): SCSI status: Check Condition
> (da1:mpr0:0:20:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset,
> or bus device reset occurred) (da1:mpr0:0:20:0): Retrying command (per
> sense data) (noperiph:mpr0:0:4294967295:0): SMID 3 Aborting
> command 0xfffffe0002609750 mpr0: Sending reset from mprsas_send_abort
for
> target ID 21 (da2:mpr0:0:21:0): READ(10). CDB: 28 00 6f c8 1a af 00 00
01
> 00 mpr0: (da2:mpr0:0:21:0): CAM status: Command timeout
> Unfreezing devq for target ID 21
> (da2:mpr0:0:21:0): Retrying command
> (da2:mpr0:0:21:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da2:mpr0:0:21:0): CAM status: SCSI Status Error
> (da2:mpr0:0:21:0): SCSI status: Check Condition
> (da2:mpr0:0:21:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset,
> or bus device reset occurred) (da2:mpr0:0:21:0): Retrying command (per
> sense data) (noperiph:mpr0:0:4294967295:0): SMID 4 Aborting
> command 0xfffffe000260c7e0 mpr0: Sending reset from mprsas_send_abort
for
> target ID 22 mpr0: Unfreezing devq for target ID 22
> (da3:mpr0:0:22:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da3:mpr0:0:22:0): CAM status: Command timeout
> (da3:mpr0:0:22:0): Retrying command
> (da3:mpr0:0:22:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da3:mpr0:0:22:0): CAM status: SCSI Status Error
> (da3:mpr0:0:22:0): SCSI status: Check Condition
> (da3:mpr0:0:22:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset,
> or bus device reset occurred) (da3:mpr0:0:22:0): Retrying command (per
> sense data) (noperiph:mpr0:0:4294967295:0): SMID 5 Aborting
> command 0xfffffe0002614f10 mpr0: Sending reset from mprsas_send_abort
for
> target ID 25 mpr0: Unfreezing devq for target ID 25
> (da6:mpr0:0:25:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da6:mpr0:0:25:0): CAM status: Command timeout
> (da6:mpr0:0:25:0): Retrying command
> (da6:mpr0:0:25:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da6:mpr0:0:25:0): CAM status: SCSI Status Error
> (da6:mpr0:0:25:0): SCSI status: Check Condition
> (da6:mpr0:0:25:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset,
> or bus device reset occurred) (da6:mpr0:0:25:0): Retrying command (per
> sense data) (noperiph:mpr0:0:4294967295:0): SMID 6 Aborting
> command 0xfffffe0002617fa0 mpr0: Sending reset from mprsas_send_abort
for
> target ID 26 mpr0: Unfreezing devq for target ID 26
> (da7:mpr0:0:26:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da7:mpr0:0:26:0): CAM status: Command timeout
> (da7:mpr0:0:26:0): Retrying command
> (da7:mpr0:0:26:0): READ(10). CDB: 28 00 6f c8 1a af 00 00 01 00
> (da7:mpr0:0:26:0): CAM status: SCSI Status Error
> (da7:mpr0:0:26:0): SCSI status: Check Condition
> (da7:mpr0:0:26:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset,
> or bus device reset occurred) (da7:mpr0:0:26:0): Retrying command (per
> sense data)
>
>
> --
> Mahlon E. Smith
In the past, when I've run into this issue. I add the following to
loader.conf(5)
kern.cam.boot_delay="<some high enough number>"
You'll want to tune it to find a "sweet spot".
fe; on one of my boxes, it reads:
kern.cam.boot_delay="7000"
which is 7 seconds.
HTH
--Chris