I upgraded a RELENG_8 box from a kernel from ~ June 15th to one today to
get some of the zfs and bind updates and on reboot, the box panic'd
twice, and booted fine the third time. I have not seen this error
before and not sure if its a hardware issue, or some odd timing issue I
ran into ? smartctl shows no errors on any of the disks recorded in
their logs
>From the serial console, this is what I saw
atapci0: <Intel ICH9 SATA300 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x3410-0x341f,0x3400-0x340f irq 21
at device 31.2 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
atapci1: <Intel ICH9 SATA300 controller> port
0x3428-0x342f,0x3444-0x3447,0x3420-0x3427,0x3440-0x3443,0x30f0-0x30ff,0x30e0-0x30ef
irq 21 at device 31.5 on pci0
atapci1: [ITHREAD]
ata2: <ATA channel 0> on atapci1
ata2: [ITHREAD]
ata3: <ATA channel 1> on atapci1
ata3: [ITHREAD]
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on
acpi0
.
.
.
ugen1.1: <Intel> at usbus1
uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <Intel> at usbus2
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <Intel> at usbus3
uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
ugen4.1: <Intel> at usbus4
uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <Intel> at usbus5
uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5
ugen6.1: <Intel> at usbus6
uhub6: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
ugen7.1: <Intel> at usbus7
uhub7: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub4: 2 ports with 2 removable, self powered
uhub5: 2 ports with 2 removable, self powered
uhub6: 2 ports with 2 removable, self powered
ad0: 1430799MB <Seagate ST31500341AS CC1H> at ata0-master UDMA100 SATA
3Gb/s
uhub3: 6 ports with 6 removable, self powered
uhub7: 6 ports with 6 removable, self powered
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
unknown: WARNING - ATA_IDENTIFY taskqueue timeout - completing request
directly
unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
ad4: 1430799MB <Seagate ST31500341AS CC1H> at ata2-master UDMA100 SATA
3Gb/s
subdisk4: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x308
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80419eae
stack pointer = 0x28:0xffffff800009dab0
frame pointer = 0x28:0xffffff800009dad0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (thread taskq)
trap number = 12
ad6: 1430799MB <Seagate ST31500341AS CC1H> at ata3-master UDMA100 SATA
3Gb/s
panic: page fault
cpuid = 0
Uptime: 14s
Cannot dump. Device not defined or unavailable.
The normal boot looks like
uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <Intel> at usbus5
uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5
ugen6.1: <Intel> at usbus6
uhub6: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
ugen7.1: <Intel> at usbus7
uhub7: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub4: 2 ports with 2 removable, self powered
uhub5: 2 ports with 2 removable, self powered
uhub6: 2 ports with 2 removable, self powered
ad0: 1430799MB <Seagate ST31500341AS CC1H> at ata0-master UDMA100 SATA
3Gb/s
ad2: 76319MB <Seagate ST380811AS 3.AAE> at ata1-master UDMA100 SATA
1.5Gb/s
ad3: 1907729MB <WDC WD2001FASS-00U0B0 01.00101> at ata1-slave UDMA100
SATA 3Gb/s
GEOM: ad2s1: geometry does not match label (255h,63s != 16h,63s).
ad4: 1430799MB <Seagate ST31500341AS CC1H> at ata2-master UDMA100 SATA
3Gb/s
ad6: 1430799MB <Seagate ST31500341AS CC1H> at ata3-master UDMA100 SATA
3Gb/s
uhub3: 6 ports with 6 removable, self powered
uhub7: 6 ports with 6 removable, self powered
pmp0 at siisch0 bus 0 scbus0 target 15 lun 0
pmp0: <Port Multiplier 47261095 1f06> ATA-0 device
pmp0: 300.000MB/s transfers (SATA 2.x, NONE, PIO 8192bytes)
pmp0: 5 fan-out ports
pmp1 at siisch1 bus 0 scbus1 target 15 lun 0
pmp1: <Port Multiplier 47261095 1f06> ATA-0 device
pmp1: 300.000MB/s transfers (SATA 2.x, NONE, PIO 8192bytes)
pmp1: 5 fan-out ports
ada0 at siisch0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD1001FALS-00J7B1 05.00K05> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada1 at siisch0 bus 0 scbus0 target 1 lun 0
ada1: <WDC WD1001FALS-00J7B1 05.00K05> ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada2 at siisch0 bus 0 scbus0 target 2 lun 0
ada2: <WDC WD1001FALS-00J7B1 05.00K05> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada3 at siisch0 bus 0 scbus0 target 3 lun 0
ada3: <WDC WD1001FALS-00J7B1 05.00K05> ATA-8 SATA 2.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada4 at siisch1 bus 0 scbus1 target 0 lun 0
ada4: <WDC WD2001FASS-00U0B0 01.00101> ATA-8 SATA 2.x device
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5 at siisch1 bus 0 scbus1 target 1 lun 0
ada5: <WDC WD1501FASS-00W2B0 05.01D05> ATA-8 SATA 2.x device
ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
ada6 at siisch1 bus 0 scbus1 target 2 lun 0
ada6: <WDC WD1501FASS-00W2B0 05.01D05> ATA-8 SATA 2.x device
ada6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada6: Command Queueing enabled
ada6: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
ada7 at siisch1 bus 0 scbus1 target 3 lun 0
ada7: <WDC WD1501FASS-00W2B0 05.01D05> ATA-8 SATA 2.x device
ada7: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada7: Command Queueing enabled
ada7: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
ada8 at siisch1 bus 0 scbus1 target 4 lun 0
ada8: <WDC WD1501FASS-00W2B0 05.01D05> ATA-8 SATA 2.x device
ada8: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada8: Command Queueing enabled
ada8: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
lapic1: Forcing LINT1 to edge trigger
SMP: AP CPU #1 Launched!
e.g.
smartctl -a /dev/ad3
smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ==Device Model: WDC WD2001FASS-00U0B0
Serial Number: WD-WMAUR0328005
Firmware Version: 01.00101
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Tue Aug 2 16:52:52 2011 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test
result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: (30900) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3037) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always
- 0
3 Spin_Up_Time 0x0027 047 047 021 Pre-fail Always
- 14650
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always
- 21
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always
- 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always
- 0
9 Power_On_Hours 0x0032 089 089 000 Old_age Always
- 8630
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always
- 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 19
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
- 18
193 Load_Cycle_Count 0x0032 134 134 000 Old_age Always
- 200377
194 Temperature_Celsius 0x0022 121 113 000 Old_age Always
- 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 4992
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
smartctl -a /dev/ad2
smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ==Model Family: Seagate Barracuda 7200.9
family
Device Model: ST380811AS
Serial Number: 6PS01Q17
Firmware Version: 3.AAE
User Capacity: 80,026,361,856 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Tue Aug 2 16:52:59 2011 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test
result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 27) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 095 006 Pre-fail Always
- 0
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always
- 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
- 63
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always
- 248537285
9 Power_On_Hours 0x0032 095 095 000 Old_age Always
- 5104
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
- 89
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 069 045 045 Old_age Always
In_the_past 31 (Lifetime Min/Max 27/32)
194 Temperature_Celsius 0x0022 031 055 000 Old_age Always
- 31 (0 20 0 0)
195 Hardware_ECC_Recovered 0x001a 057 048 000 Old_age Always
- 24265782
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always
- 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/