Yes, I know 4.11 is EOL'd at the end of this month, but hopefully
someone can shed some light on this problem anyways. I simply don't
have the knowledge of what's going on on a low-level to determine
the cause.
I do have serial console on this box, and after enabling some
debugging for the ahc(4) driver a few months back, was able to
get something intelligent out of the system regarding SCBs this
morning. This may not be useful (or the cause), though. I also
cannot enable drop-to-DDB-on-serial-break because our Portmaster 2
has been known to send a serial break on rare occasion. :-(
Every so often (sometimes hours, sometimes months -- usually months),
the 4.11 box we have "locks up" in the sense that both NICs on the
box stop working, and the SCSI controller also appears hung. This
problem has existed for a couple years; it's not specific to 4.11
(versus 4.10 or 4.9).
I have to hard reset or power cycle the box to get it working again.
The problem will continue indefinitely until the machine is reset;
meaning it does not recover on its own. Naturally this means quite
an ugly fsck when the machine comes back up.
The initial symptoms are:
fxp0: device timeout
fxp1: device timeout
ahc0: Timedout SCBs already complete. Interrupts may not be functioning.
Hardware:
* 2x Pentium III 933MHz
* Tyan Tiger 200
- VIA NB/SB chipset (probably the cause of all this :) )
- Mainboard running latest BIOS
- 2x Intel 82559 on-board NIC
* 512MB RAM (ECC; has built world for years no problem)
* Adaptec 29160 U160 controller (physical card, not on-board);
not sure what Adaptec BIOS revision (anyway to check via FreeBSD?)
* Hard disk is a single 16GB U160/SCSI-3 drive
* Kernel is SMP
Devices and associated IRQs:
fxp0: <Intel 82559 Pro/100 Ethernet> port 0xe000-0xe03f mem
0xd6000000-0xd60fffff,0xd6202000-0xd6202fff irq 10 at device 13.0 on pci0
fxp1: <Intel 82559 Pro/100 Ethernet> port 0xe400-0xe43f mem
0xd6100000-0xd61fffff,0xd6201000-0xd6201fff irq 11 at device 14.0 on pci0
ahc0: <Adaptec 29160B Ultra160 SCSI adapter> port 0xe800-0xe8ff mem
0xd6203000-0xd6203fff irq 11 at device 16.0 on pci0
da0: <IBM DDYS-T18350N S96H> Fixed Direct Access SCSI-3 device
# vmstat -i
ata0 irq14 6 0
fxp0 irq10 14874 28
mux irq11 65028 125
fdc0 irq6 1 0
sio0 irq4 948 1
clk irq0 516187 998
rtc irq8 66071 127
Total 663115 1282
# pciconf -l
agp0@pci0:0:0: class=0x060000 card=0x00000000 chip=0x06911106 rev=0xc4 hdr=0x00
pcib2@pci0:1:0: class=0x060400 card=0x00000080 chip=0x85981106 rev=0x00 hdr=0x01
none0@pci0:6:0: class=0x030000 card=0x00081002 chip=0x47521002 rev=0x27 hdr=0x00
isab0@pci0:7:0: class=0x060100 card=0x00001106 chip=0x06861106 rev=0x40 hdr=0x00
atapci0@pci0:7:1: class=0x01018a card=0x05711106 chip=0x05711106 rev=0x06
hdr=0x00
viapropm0@pci0:7:4: class=0x060000 card=0x00000000 chip=0x30571106 rev=0x40
hdr=0x00
fxp0@pci0:13:0: class=0x020000 card=0x000c8086 chip=0x12298086 rev=0x08 hdr=0x00
fxp1@pci0:14:0: class=0x020000 card=0x000c8086 chip=0x12298086 rev=0x08 hdr=0x00
ahc0@pci0:16:0: class=0x010000 card=0xe2209005 chip=0x00809005 rev=0x02 hdr=0x00
I can include my kernel configuration if need be, but it's fairly
standard.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
=== SNIP ==
FreeBSD/i386 (pentarou.parodius.com) (ttyd0)
login: fxp0: device timeout
ahc0: Recovery Initiated>>>>>>>>>>>>>>>>>> Dump
Card State Begins
<<<<<<<<<<<<<<<<<
ahc0: Dumping Card State in Message-out phase, at SEQADDR 0x16c
Card was paused
ACCUM = 0xa0, SINDEX = 0x61, DINDEX = 0xe4, ARG_2 = 0x10
HCNT = 0x0 SCBPTR = 0x1f
SCSIPHASE[0x4] SCSISIGI[0xb6] ERROR[0x0] SCSIBUSL[0x41]
LASTPHASE[0xa0] SCSISEQ[0x12] SBLKCTL[0xa] SCSIRATE[0x0]
SEQCTL[0x10] SEQ_FLAGS[0x40] SSTAT0[0x2] SSTAT1[0x1]
SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac]
SXFRCTL0[0x88] DFCNTRL[0x0] DFSTATUS[0x89]
STACK: 0xe2 0xe2 0xe2 0x179
SCB count = 130
Kernel NEXTQSCB = 98
Card NEXTQSCB = 124
QINFIFO entries: 124 76 86 37 106 30 87 80 59 104 127 110 22
Waiting Queue entries:
Disconnected Queue entries:
QOUTFIFO entries:
Sequencer Free SCB List: 24 10 4 12 17 22 15 5 7 28 8 18 30 6 23 26 14 21 1 19
27 29 11 25 3 2 9 13 20 16 0
Sequencer SCB Info:
0 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
1 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
4 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
5 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
7 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
8 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
9 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
10 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
11 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
12 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
13 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
14 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
15 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
16 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
17 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
18 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
19 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
20 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
21 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
22 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
23 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
24 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
25 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
26 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
27 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
28 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
29 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
30 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
31 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x41]
Pending list:
22 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0]
110 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
127 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0]
104 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0]
59 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
80 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
87 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
30 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
106 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
37 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
86 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
76 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
124 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
65 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0]
Kernel Free SCB list: 5 118 81 94 27 97 14 56 1 115 93 128 33 41 31 36 12 54 64
79 4 55 63 107 70 119 15 39 77 69 66 17 67 95 58 16 75 100 53 29 47 125 60 111
71 10 129 114 82 25 35 99 117 83 44 38 123 92 74 126 90 85 50 46 32 68 45 21 48
102 96 57 42 89 43 78 109 62 23 72 0 116 120 2 11 105 20 103 52 101 26 24 121 51
122 40 112 18 34 84 73 13 7 91 28 108 19 9 8 88 3 61 49 6 113
<<<<<<<<<<<<<<<<< Dump Card
State Ends
>>>>>>>>>>>>>>>>>>
(da0:ahc0:0:0:0): SCB 0x16 - timed out
sg[0] - Addr 0x6b39000 : Length 4096
sg[1] - Addr 0x6bba000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x6e - timed out
sg[0] - Addr 0x9f79000 : Length 4096
sg[1] - Addr 0x1b63a000 : Length 2560
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x7f - timed out
sg[0] - Addr 0x64e7000 : Length 4096
sg[1] - Addr 0x64c8000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x68 - timed out
sg[0] - Addr 0x88e6000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x3b - timed out
sg[0] - Addr 0x5e05000 : Length 4096
sg[1] - Addr 0x2fa6000 : Length 2048
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x50 - timed out
sg[0] - Addr 0xef5b000 : Length 4096
sg[1] - Addr 0xf05c000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x57 - timed out
sg[0] - Addr 0x14c7000 : Length 4096
sg[1] - Addr 0x111a8000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x1e - timed out
sg[0] - Addr 0x19de9000 : Length 4096
sg[1] - Addr 0x1cdaa000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x6a - timed out
sg[0] - Addr 0x725d000 : Length 4096
sg[1] - Addr 0x709e000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x25 - timed out
sg[0] - Addr 0x1db1b000 : Length 4096
sg[1] - Addr 0xf95c000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x56 - timed out
sg[0] - Addr 0x91b1000 : Length 4096
sg[1] - Addr 0x94f2000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x4c - timed out
sg[0] - Addr 0x7087000 : Length 4096
sg[1] - Addr 0x6f48000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x7c - timed out
sg[0] - Addr 0x63b1000 : Length 4096
sg[1] - Addr 0x66b2000 : Length 4096
(da0:ahc0:0:0:0): Other SCB Timeout
(da0:ahc0:0:0:0): SCB 0x41 - timed out
sg[0] - Addr 0x62cd000 : Length 4096
sg[1] - Addr 0x656e000 : Length 4096
(da0:ahc0:0:0:0): BDR message in message buffer
ahc0: Timedout SCBs already complete. Interrupts may not be functioning.
ahc0: Recovery Initiated>>>>>>>>>>>>>>>>>> Dump
Card State Begins
<<<<<<<<<<<<<<<<<
ahc0: Dumping Card State in Message-out phase, at SEQADDR 0x16c
Card was paused
ACCUM = 0xa0, SINDEX = 0x61, DINDEX = 0xe4, ARG_2 = 0x10
HCNT = 0x0 SCBPTR = 0x1f
SCSIPHASE[0x4] SCSISIGI[0xb6] ERROR[0x0] SCSIBUSL[0x6]
LASTPHASE[0xa0] SCSISEQ[0x12] SBLKCTL[0xa] SCSIRATE[0x0]
SEQCTL[0x10] SEQ_FLAGS[0x40] SSTAT0[0x2] SSTAT1[0x1]
SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac]
SXFRCTL0[0x88] DFCNTRL[0x0] DFSTATUS[0x89]
STACK: 0xe2 0xe2 0xe2 0x179
SCB count = 130
Kernel NEXTQSCB = 98
Card NEXTQSCB = 124
QINFIFO entries: 124 76 86 37 106 30 87 80 59 104 127 110 22
Waiting Queue entries:
Disconnected Queue entries:
QOUTFIFO entries:
Sequencer Free SCB List: 24 10 4 12 17 22 15 5 7 28 8 18 30 6 23 26 14 21 1 19
27 29 11 25 3 2 9 13 20 16 0
Sequencer SCB Info:
0 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
1 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
4 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
5 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
7 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
8 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
9 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
10 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
11 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
12 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
13 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
14 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
15 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
16 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
17 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
18 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
19 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
20 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
21 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
22 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
23 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
24 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
25 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
26 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
27 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
28 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
29 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
30 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff]
31 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x41]
Pending list:
22 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0]
110 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
127 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0]
104 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0]
59 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
80 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
87 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
30 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
106 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
37 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
86 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
76 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
124 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0]
65 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0]
Kernel Free SCB list: 5 118 81 94 27 97 14 56 1 115 93 128 33 41 31 36 12 54 64
79 4 55 63 107 70 119 15 39 77 69 66 17 67 95 58 16 75 100 53 29 47 125 60 111
71 10 129 114 82 25 35 99 117 83 44 38 123 92 74 126 90 85 50 46 32 68 45 21 48
102 96 57 42 89 43 78 109 62 23 72 0 116 120 2 11 105 20 103 52 101 26 24 121 51
122 40 112 18 34 84 73 13 7 91 28 108 19 9 8 88 3 61 49 6 113
<<<<<<<<<<<<<<<<< Dump Card
State Ends
>>>>>>>>>>>>>>>>>>
(da0:ahc0:0:0:0): SCB 0x41 - timed out
sg[0] - Addr 0x62cd000 : Length 4096
sg[1] - Addr 0x656e000 : Length 4096
(da0:ahc0:0:0:0): no longer in timeout, status = 34b
ahc0: Issued Channel A Bus Reset. 14 SCBs aborted
ahc0: Timedout SCBs already complete. Interrupts may not be functioning.
fxp1: device timeout