A few months back, I swapped out my dying hard drive for a WD Scorpio Blue.
Cheap, seemed reliable, and it was the only drive the local shop had in
stock. However, it seems that AHCI doesn't like this device, and is having
troubles during an S3 resume. It appears as though I'm experiencing two
types of timeouts when resuming: recoverable, and non-recoverable.
My question is: do I have a bad HDD, or is AHCI just not playing nicely?
First off, if I try to resume when in graphics mode, it pretty much always
fails (screen backlight comes on, but nothing else, system is non-responsive
to all keypresses save ctrl+alt+del and a hard power down). I'm currently
convinced this is because it's doing something that's triggering a
non-recoverable timeout. However, when doing a suspend/resume from a text
console, I can usually hit a recoverable timeout.
Which looks like this (full suspend-resume cycle):
-----
drm0: <Mobile Intel\M-B\M-. GM45 Express Chipset> on vgapci0
info: [drm] MSI enabled 1 message(s)
vgapci0: child drm0 requested pci_enable_busmaster
info: [drm] AGP at 0xd0000000 256MB
info: [drm] Initialized i915 1.6.0 20080730
drm0: [ITHREAD]
<switch from graphics to text; enter S3 sleep>
wlan0: link state changed to DOWN
<resume from S3 sleep>
ugen1.2: <vendor 0x08ff> at usbus1 (disconnected)
ugen2.2: <Chicony Electronics Co., Ltd.> at usbus2 (disconnected)
ugen1.2: <vendor 0x08ff> at usbus1
ugen1.3: <Lenovo Computer Corp> at usbus1 (disconnected)
ugen2.2: <Chicony Electronics Co., Ltd.> at usbus2
ugen1.3: <Lenovo Computer Corp> at usbus1
wlan0: link state changed to UP
ahcich0: Timeout on slot 24
ahcich0: is 00400040 cs 0f000000 ss 0f000000 rs 0ff00000 tfd 80 serr 04050002
-----
Recovery from these timeouts seems to take 30s.
However, periodically when resuming into text mode (and I assume what is
almost always when resuming into graphics mode), I trigger a non-recoverable
timeout:
-----
ahcich0: Timeout on slot 13
ahcich0: is 00400000 cs 0000e000 ss 0000e000 rs 0000e00 tfd 80 serr 04050000
ahcich0: device is not ready (timeout 15000ms) tfd = 00000080
ahcich0: Timeout on slot 15
ahcich0: is 00400000 cs 0000e000 ss 0000e000 rs 0000800 tfd 00 serr 00000000
ahcich0: device is not ready (timeout 15000ms) tfd = 00000000
(ada0:ahcich0:0:0:0): lost device
ahcich0: Poll timeout on slot 15
ahcich0: is 00000000 cs 00008000 ss 00000000 rs 0000800 tfd 00 serr 00000000
GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=13067157504,
length=65536)]
GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=262144, length=8192)]
GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=482927181824,
length=8192)]
GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=482927443968,
length=8192)]
ahcich0: Timeout on slot 15
ahcich0: is 00000000 cs 00038000 ss 00038000 rs 0003800 tfd 00 serr 00000000
ahcich0: devfice is not ready (timeout 15000ms) tfd = 00000080
ahcich0: Poll timeout on slot 17
ahcich0: is 00000000 cs 00020000 ss 00000000 rs 0002000 tfd 00 serr 00000000
ahcich0: Timeout on slot 17
ahcich0: is 00000000 cs 000e0000 ss 000e0000 rs 000e000 tfd 00 serr 00000000
ahcich0: devfice is not ready (timeout 15000ms) tfd = 00000080
-----
This continues, with the slot increasing by two every time. Each
'Timeout'
and 'Poll timeout' takes ~15s to trigger, and I'd let this run to
slot 23
before I powered down.
The HDD is a WD Scorpio blue, model WD5000BEVT-22A0RT0, and isn't exactly
the fastest drive on the planet. SMART seems to be relatively clean, with
some mild questions surrounding attributes 191, 9/193, and 194:
-----
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always -
0
3 Spin_Up_Time 0x0027 186 185 021 Pre-fail Always -
1675
4 Start_Stop_Count 0x0032 055 055 000 Old_age Always -
45174
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always -
0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always -
0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always -
723
10 Spin_Retry_Count 0x0032 100 100 051 Old_age Always -
0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always -
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always -
57
191 G-Sense_Error_Rate 0x0032 072 072 000 Old_age Always -
28
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always -
48
193 Load_Cycle_Count 0x0032 162 162 000 Old_age Always -
115712
194 Temperature_Celsius 0x0022 112 106 000 Old_age Always -
35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always -
0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always -
0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline -
0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always -
0
200 Multi_Zone_Error_Rate 0x0008 100 253 051 Old_age Offline -
0
-----
So, am I looking at a bum HDD, or is AHCI just not playing nicely?
Hardware and traditional troubleshooting info:
Laptop is a Lenovo X200.
Disk controller:
-----
ahci0@pci0:0:31:2: class=0x010601 card=0x20f817aa chip=0x29298086 rev=0x03
hdr=0x00
vendor = 'Intel Corporation'
device = '82801IB/IR/IH (ICH9 Family) Mobile SATA AHCI
Controller'
class = mass storage
subclass = SATA
-----
loader.conf:
-----
ahci_load="YES"
geom_eli_load="YES"
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot"
vfs.zfs.txg.timeout="5"
iwnfw_load="YES"
if_iwn_load="YES"
snd_hda_load="YES"
sem_load="YES"
smb_load="YES"
coretemp_load="YES"
usb_load="YES"
uhid_load="YES"
uhci_load="YES"
ohci_load="YES"
umass_load="YES"
linux_enable="YES"
kern.maxfiles="35000"
cpufreq_load="YES"
hint.p4tcc.0.disable="1"
hint.acpi_throttle.0.disable="1"
hw.pci.do_power_nodriver="2"
-----
dmesg:
-----
Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.1-PRERELEASE #7 r208254: Tue May 18 08:44:31 EDT 2010
root@plebeian.afflictions.org:/usr/obj/usr/src/sys/PLEBEIAN amd64
module_register: module g_eli already exists!
Module g_eli failed to register: 17
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz (2394.02-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17
Stepping = 6
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x8e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
AMD Features=0x20100800<SYSCALL,NX,LM>
AMD Features2=0x1<LAHF>
TSC: P-state invariant
real memory = 4294967296 (4096 MB)
avail memory = 3991834624 (3806 MB)
ACPI APIC Table: <LENOVO TP-6D >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
ioapic0: Changing APIC ID to 1
ioapic0 <Version 2.0> irqs 0-23 on motherboard
kbd1 at kbdmux0
cryptosoft0: <software crypto> on motherboard
acpi0: <LENOVO TP-6D> on motherboard
CPU0: local APIC error 0x40
acpi0: [ITHREAD]
acpi_ec0: <Embedded Controller: GPE 0x11, ECDT> port 0x62,0x66 on acpi0
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, bff00000 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on
acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
acpi_lid0: <Control Method Lid Switch> on acpi0
acpi_button0: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x1800-0x1807 mem
0xf2000000-0xf23fffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0
agp0: <Intel GM45 SVGA controller> on vgapci0
agp0: detected 32764k stolen memory
agp0: aperture size is 256M
vgapci1: <VGA-compatible display> mem 0xf2400000-0xf24fffff at device 2.1
on pci0
pci0: <simple comms> at device 3.0 (no driver attached)
pci0: <network, ethernet> at device 25.0 (no driver attached)
uhci0: <Intel 82801I (ICH9) USB controller> port 0x1860-0x187f irq 20 at
device 26.0 on pci0
uhci0: [ITHREAD]
usbus0: <Intel 82801I (ICH9) USB controller> on uhci0
uhci1: <Intel 82801I (ICH9) USB controller> port 0x1880-0x189f irq 21 at
device 26.1 on pci0
uhci1: [ITHREAD]
usbus1: <Intel 82801I (ICH9) USB controller> on uhci1
uhci2: <Intel 82801I (ICH9) USB controller> port 0x18a0-0x18bf irq 22 at
device 26.2 on pci0
uhci2: [ITHREAD]
usbus2: <Intel 82801I (ICH9) USB controller> on uhci2
pci0: <serial bus, USB> at device 26.7 (no driver attached)
hdac0: <Intel 82801I High Definition Audio Controller> mem
0xf2620000-0xf2623fff irq 17 at device 27.0 on pci0
hdac0: HDA Driver Revision: 20100226_0142
hdac0: [ITHREAD]
pcib1: <ACPI PCI-PCI bridge> irq 20 at device 28.0 on pci0
pci2: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> irq 21 at device 28.1 on pci0
pci3: <ACPI PCI bus> on pcib2
iwn0: <Intel(R) PRO/Wireless 4965BGN> mem 0xf2500000-0xf2501fff irq 17 at
device 0.0 on pci3
iwn0: MIMO 2T3R, MoW1, address 00:21:5c:5f:7e:e3
iwn0: [ITHREAD]
iwn0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
iwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
iwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps
36Mbps 48Mbps 54Mbps
pcib3: <ACPI PCI-PCI bridge> irq 23 at device 28.3 on pci0
pci5: <ACPI PCI bus> on pcib3
uhci3: <Intel 82801I (ICH9) USB controller> port 0x18c0-0x18df irq 16 at
device 29.0 on pci0
uhci3: [ITHREAD]
usbus3: <Intel 82801I (ICH9) USB controller> on uhci3
uhci4: <Intel 82801I (ICH9) USB controller> port 0x18e0-0x18ff irq 17 at
device 29.1 on pci0
uhci4: [ITHREAD]
usbus4: <Intel 82801I (ICH9) USB controller> on uhci4
uhci5: <Intel 82801I (ICH9) USB controller> port 0x1c00-0x1c1f irq 18 at
device 29.2 on pci0
uhci5: [ITHREAD]
usbus5: <Intel 82801I (ICH9) USB controller> on uhci5
pci0: <serial bus, USB> at device 29.7 (no driver attached)
pcib4: <PCI-PCI bridge> at device 30.0 on pci0
pci13: <PCI bus> on pcib4
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
ahci0: <Intel ICH9M AHCI SATA controller> port
0x1c48-0x1c4f,0x183c-0x183f,0x1c40-0x1c47,0x1838-0x183b,0x1c20-0x1c3f mem
0xf2826000-0xf28267ff irq 16 at device 31.2 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.20 with 4 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich0: [ITHREAD]
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich1: [ITHREAD]
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_tz0: <Thermal Zone> on acpi0
acpi_tz1: <Thermal Zone> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model Generic PS/2 mouse, device ID 0
battery0: <ACPI Control Method Battery> on acpi0
acpi_acad0: <AC Adapter> on acpi0
orm0: <ISA Option ROMs> at iomem
0xc0000-0xcffff,0xd0000-0xd0fff,0xd1000-0xd1fff,0xd2000-0xd2fff,0xde000-0xdf7ff,0xe0000-0xeffff
on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
coretemp0: <CPU On-Die Thermal Sensors> on cpu0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
coretemp1: <CPU On-Die Thermal Sensors> on cpu1
est1: <Enhanced SpeedStep Frequency Control> on cpu1
p4tcc1: <CPU Frequency Thermal Control> on cpu1
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
to enable, add "vfs.zfs.prefetch_disable=0" to
/boot/loader.conf.
ZFS filesystem version 3
ZFS storage pool version 14
Timecounters tick every 1.000 msec
hdac0: HDA Codec #0: Conexant CX20561 (Hermosa)
pcm0: <HDA Conexant CX20561 (Hermosa) PCM #0 Analog> at cad 0 nid 1 on
hdac0
pcm1: <HDA Conexant CX20561 (Hermosa) PCM #1 Analog> at cad 0 nid 1 on
hdac0
The GEOM class ELI is already loaded.
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 12Mbps Full Speed USB v1.0
usbus4: 12Mbps Full Speed USB v1.0
usbus5: 12Mbps Full Speed USB v1.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <Intel> at usbus2
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <Intel> at usbus3
uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3
ugen4.1: <Intel> at usbus4
uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <Intel> at usbus5
uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD5000BEVT-22A0RT0 01.01A01> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
SMP: AP CPU #1 Launched!
hwpmc: TSC/1/64/0x20<REA>
IAP/2/40/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC>
IAF/3/40/0x61<INT,REA,WRI>
Enter passphrase for ada0p3: uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub3: 2 ports with 2 removable, self powered
uhub4: 2 ports with 2 removable, self powered
uhub5: 2 ports with 2 removable, self powered
ugen1.2: <vendor 0x08ff> at usbus1
ugen2.2: <Chicony Electronics Co., Ltd.> at usbus2
ugen5.2: <RICOH> at usbus5
umass0: <RICOH USB2.0-FLASH Media, class 0/0, rev 2.00/0.01, addr 2> on
usbus5
ugen1.3: <Lenovo Computer Corp> at usbus1
(probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
(probe0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(probe0:umass-sim0:0:0:0): SCSI status: Check Condition
(probe0:umass-sim0:0:0:0): SCSI sense: UNIT ATTENTION asc:28,0 (Not ready to
ready change, medium may have changed)
da0 at umass-sim0 bus 0 scbus2 target 0 lun 0
da0: <RICOH R5U880FlashMedia 0000> Removable Direct Access SCSI-2 device
da0: 1.000MB/s transfers
da0: 15326MB (31388672 512 byte sectors: 255H 63S/T 1953C)
GEOM_ELI: Device ada0p3.eli created.
GEOM_ELI: Encryption: AES-CBC 256
GEOM_ELI: Crypto: software
Trying to mount root from zfs:zroot
wlan0: Ethernet address: 00:21:5c:5f:7e:e3
drm0: <Mobile Intel\M-B\M-. GM45 Express Chipset> on vgapci0
info: [drm] MSI enabled 1 message(s)
vgapci0: child drm0 requested pci_enable_busmaster
info: [drm] AGP at 0xd0000000 256MB
info: [drm] Initialized i915 1.6.0 20080730
drm0: [ITHREAD]
-----