We're seeing some strange timeout errors on some new Supermicro
X9DRT-HF MB's we here when combined with KINGSTON HyperX 3K SSD's
It seems that when connnected to the second channel reads often
timeout stalling all IO under 8.3-RELEASE-p3
When this happens we see:-
Jul 27 14:35:59 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:35:59 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00000001 rs
00000001 tfd 40 serr 00880000 cmd 0004c017
Jul 27 14:37:41 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:37:41 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00000001 rs
00000001 tfd 40 serr 00880000 cmd 0004c017
Jul 27 14:38:35 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:38:35 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00000001 rs
00000001 tfd 40 serr 00880000 cmd 0004c017
Jul 27 14:39:05 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:39:05 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00000001 rs
00000001 tfd 40 serr 00880000 cmd 0004c017
Jul 27 14:39:39 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:39:39 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00000001 rs
00000001 tfd 40 serr 00880000 cmd 0004c017
Jul 27 13:58:06 lon059 kernel: ahcich1: Timeout on slot 14 port 0
Jul 27 13:58:06 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00004000 rs
00004000 tfd 40 serr 00880000 cmd 0004ce17
Jul 27 14:21:17 lon059 kernel: ahcich1: Timeout on slot 14 port 0
Jul 27 14:21:17 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00004000 rs
00004000 tfd 40 serr 00880000 cmd 0004ce17
Jul 27 14:29:16 lon059 kernel: ahcich1: Timeout on slot 7 port 0
Jul 27 14:29:16 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00000080 rs
00000080 tfd 40 serr 00880000 cmd 0004c717
Jul 27 14:31:43 lon059 kernel: ahcich1: Timeout on slot 12 port 0
Jul 27 14:31:43 lon059 kernel: ahcich1: is 00000000 cs 00000000 ss 00001000 rs
00001000 tfd 40 serr 00880000 cmd 0004cc17
The disk in ahcich0 is identical but doesn't seem to exhibit the
same problem. Thought it may be a disk issue even though they
are brand new but 2 out of the 3 machines tested have the same
problem.
In addition I've not managed to reproduce the issue if I force
sata to rev 2 with: hint.ahcich.1.sata_rev=2
Machine is running with the latest SSD and machine firmware / bios.
Could this be a ahci bug?
dmesg and camcontrol output:-
ahci0: <Intel Patsburg AHCI SATA controller> port
0x9050-0x9057,0x9040-0x9043,0x9030-0x9037,0x9020-0x9023,0x9000-0x901f mem
0xdfa22000-0xdfa227ff irq 18 at device 31.2 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich0: [ITHREAD]
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich2: [ITHREAD]
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich3: [ITHREAD]
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich4: [ITHREAD]
ahcich5: <AHCI channel> at channel 5 on ahci0
ahcich5: [ITHREAD]
camcontrol identify ada1
pass1: <KINGSTON SH103S3120G 501ABBF0> ATA-8 SATA 3.x device
pass1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
protocol ATA/ATAPI-8 SATA 3.x
device model KINGSTON SH103S3120G
firmware revision 501ABBF0
serial number 50026B7223027059
WWN 50026b7223027059
cylinders 16383
heads 16
sectors/track 63
sector size logical 512, physical 512, offset 0
LBA supported 234441648 sectors
LBA48 supported 234441648 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA6
media RPM non-rotating
Feature Support Enabled Value Vendor
read ahead yes yes
write cache yes yes
flush cache yes yes
overlap no
Tagged Command Queuing (TCQ) no no
Native Command Queuing (NCQ) yes 32 tags
SMART yes yes
microcode download yes yes
security yes no
power management yes yes
advanced power management yes yes 254/0xFE
automatic acoustic management no no
media status notification no no
power-up in Standby yes no
write-read-verify yes no 0/0x0
unload yes yes
free-fall no no
data set management (DSM/TRIM) yes
DSM - max 512byte blocks yes 8
DSM - deterministic read yes any value
Regards
Steve
===============================================This e.mail is private and
confidential between Multiplay (UK) Ltd. and the person or entity to whom it is
addressed. In the event of misdirection, the recipient is prohibited from using,
copying, printing or otherwise disseminating it or any information contained in
it.
In the event of misdirection, illegible or incomplete transmission please
telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.